Standard Generalized Markup Language
From Wikipedia, the free encyclopedia
| This article only describes one highly specialized aspect of its associated subject. Please help improve this article by adding more general information. |
| Internet media type | application/sgml, text/sgml |
|---|---|
| Uniform Type Identifier | public.xml |
| Developed by | ISO |
| Type of format | metalanguage |
| Extended from | GML |
| Extended to | HTML, XML |
| Standard(s) | ISO 8879 |
The Standard Generalized Markup Language (ISO 8879:1986 SGML) is an ISO Standard metalanguage in which one can define markup languages for documents. SGML is a descendant of IBM's Generalized Markup Language (GML), developed in the 1960s by Charles Goldfarb, Edward Mosher and Raymond Lorie (whose surname initials were used by Goldfarb to make up the term GML[1]).
SGML provides an abstract syntax that can be implemented in many different concrete syntaxes. For instance, although it is the norm to use angle brackets as tag delimiters in an SGML document (per the reference concrete syntax defined in the standard), it is possible to use other characters instead, provided that a suitable concrete syntax is defined in the document's SGML declaration.[2] For example, an SGML interpreter could be programmed to parse GML markup. In GML, tags are bounded by a colon on the left and a full stop on the right; an e prefix denotes an end tag: :xmp.Hello, world:exmp..
Contents |
[edit] Original uses
SGML was originally designed to enable the sharing of machine-readable documents in large projects in government, law and industry, which have to remain readable for several decades—a long time in information technology. It has also been used extensively in the printing and publishing industries, but its complexity has prevented its widespread application for small-scale general-purpose use.
Primarily intended for text and database publishing, one of its first major applications was the second edition of the Oxford English Dictionary (OED), which was and is wholly marked up using an SGML-like markup.[citation needed]
[edit] Syntax
SGML allows most aspects of a markup language's syntax to be customized. Different character sets, different delimiter sets and different keywords can be specified in the SGML Declaration. The result of the various SGML declaration settings is called the concrete syntax of the document.
The default concrete syntax appears similar to this example:
<QUOTE TYPE="example"> typically something like <ITALICS>this</ITALICS> </QUOTE>
HTML uses this SGML default concrete syntax.
An SGML document has three parts: the SGML Declaration, the Prolog which contains a DOCTYPE declaration and indentifies the DTD, and the instance itself. An SGML document may be split into many different physical parts, called entities. Customization of the syntax for a markup language in SGML is specified by a Document Type Definition, or DTD.
According to the reference syntax, letter case is not distinguished in tag names so the three tags <quote>, <QUOTE>, and <quOtE> are equivalent (a concrete syntax may change this rule through the NAMECASE NAMING declarations).
[edit] Markup Minimization
SGML provides many features for reducing the number of characters required to mark a document up. Some of these features need to be enabled in the SGML declaration, and SGML software does not need to support all features. This also allows SGML applications to tolerate many kinds of innocent markup ommissions, however SGML systems are usually highly intolerant of invalid structures in the elements; XML turns this on its head, being strict on syntax and not requiring a DTD for validation.
Information in the DTD specifies whether an element's start- or end- tags may be omitted; SGML provides rules for implying omitted tags. This is the OMITTAG feature.
Tags may be replaced by delimiter strings, in order to make terser markup. This is the SHORTREF feature. This style of markup is the one now associated with Wiki markup, for example, where two equals signs at the start of a line act as a heading start-tag, with two equals signs after this acting as the heading end-tag.
Whether a tag must be paired (like the above <QUOTE></QUOTE> pair) or occurs singly (like an HTML <HR>) is defined in the DTD for the markup language being defined (as long as the OMITTAG feature is enabled). In this case the XML counterpart would be the specific empty tag <hr/>, which is equivalent to the SGML NET-enabling start-tag, introduced in the TC2 (International Standard ISO 8879:1986, Technical Corrigendum 2, Nov. 1999).
SGML markup languages whose concrete syntax enables the SHORTTAG VALUE feature, do not require attribute values containing only alphanumeric characters to be surrounded by quote marks " (LIT) or ' (LITA), so that the above markup could be written:
<QUOTE TYPE=example> typically something like <ITALICS>this</ITALICS> </QUOTE>
One feature of SGML markup languages is the NET (Null End Tag) construction: <ITALICS/this/ which is structurally equivalent to <ITALICS>this</ITALICS>. Another is the "presumptuous empty tagging", such that the empty end tag </> in <ITALICS>this</> "inherits" its value from the nearest previous non-empty start tag, which in this example is <ITALICS> (in other words, it closes the most recently opened item). The expression is thus another, more concise, equivalent to <ITALICS>this</ITALICS>. A third is the 'text on the same line' feature, which allows an item to be ended by a line-end (especially useful for headings and the like).
Additionally, the SHORTTAG NETENABL IMMEDNET feature allows shortening of tags that surround an empty text value:
<QUOTE></QUOTE>
can be written as
<QUOTE//
Where the first "/" stands for the NET-enabling start-tag close (NETSC) and the second one stands for the NET. (Note: XML defines NETSC as "/" and NET as ">" hence, in XML, this construct looks as <QUOTE/>).
SGML is an ISO standard: "ISO 8879:1986 Information processing—Text and office systems—Standard Generalized Markup Language (SGML)" which was accepted in October 1986. It has had two minor updates (Technical Corrigenda): in 1996 to add extended naming rules to allow markup in arbitrary languages and scripts, and in 1998 to support XML and WWW better.
[edit] Derivatives
[edit] XML
XML is a subset of SGML, designed so as to make the parser much easier to implement than a full SGML parser. A consequence of the ease of implementation is that XML, rather than SGML, is nowadays widely used for deriving document specifications. Contributing to this is also the fact that few SGML-aware programs existed when XML was created. The number of XML applications today is large. XML also has a lightweight internationalization. XML is used for general-purpose applications, such as the Semantic Web, XHTML, SVG, RSS, Atom, XML-RPC and SOAP.
[edit] Applications
Languages defined using SGML are known as "applications".
[edit] HTML
The design of HTML was originally inspired by SGML tagging, but since no clear guidelines for expansion were offered, many HTML documents are not proper SGML. HTML was later reformulated (at version 2.0) to be an application of SGML, but only compliant documents can be considered proper SGML, and for a large number of HTML documents, validation was never pursued. The charter for the recently revived World Wide Web Consortium HTML Working Group goes as far as to say, "the Group will not assume that an SGML parser is used for 'classic HTML'".[3]
Although its syntax closely resembles that of SGML, HTML 5 has abandoned any attempt to be an SGML application, and has explicitly defined its own "html" serialization, although it does also define an alternative XML-based XHTML 5 serialization. [4]
[edit] DocBook
Another markup language originally created as an application of SGML is DocBook, designed for authoring technical documentation. DocBook is now also available as an XML application.
[edit] Other
There are also a number of languages that are related in part to SGML and XML, but, because they cannot be parsed or validated or otherwise processed using standard SGML and XML tools, cannot be considered applications of SGML or XML. One example is the Z Format, a language designed for typesetting and documentation.
[edit] See also
- S-Expression
- DSSSL - a Scheme-based processing language similar to XSL
- LaTeX
- List of general purpose markup languages
- Markup language
- SGML entity
- HyTime
[edit] References
- ^ Charles F. Goldfarb (1996). "The Roots of SGML - A Personal Recollection". http://www.sgmlsource.com/history/roots.htm. Retrieved on 2007-07-07.
- ^ SGML Declarations
- ^ "HTML Working Group Charter". http://www.w3.org/2007/03/HTML-WG-charter. Retrieved on 2007-04-19.
- ^ "HTML 5, one vocabulary, two serializations". http://www.w3.org/QA/2008/01/html5-is-html-and-xml.html. Retrieved on 2009-02-25.
[edit] External links
- Overview of SGML Resources at W3C's website.
- Introduction and Examples of Software Documentation in SGML
- A gentle introduction to SGML
- SGML Syntax Summary by Charles Goldfarb
- SGML document introducing you to SGML; Some reasons why SGML is important
- The SGML Declaration, in SGML and HTML Explained, Martin Bryan (1997)
- SGML Declarations Wayne Wohler, IBM Corporation, 1994.


