Talk:Markup language

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
Former featured article Markup language is a former featured article. Please see the links under Article milestones below for its original nomination page (for older articles, check the nomination archive) and why it was removed.
Article milestones
Date Process Result
January 19, 2004 Refreshing brilliant prose Kept
March 27, 2007 Featured article review Demoted
Current status: Former featured article
WikiProject Spoken Wikipedia
WikiProject icon This article is within the scope of WikiProject Spoken Wikipedia, a collaborative effort to improve the coverage of articles that are spoken on Wikipedia.
Version 0.5 (Rated B-class)
WikiProject icon This article has been selected for Version 0.5 and subsequent release versions of Wikipedia.
B-Class article B  Quality: B-Class
 ???  Importance: not yet rated

Centuries Old[edit]

The intro paragraph indicates 'Markup languages have been in use for centuries.' This makes little sense when the history section notes that the idea was first publicly presented in 1967 and not until the 1970s until it was used.
~David Craft (talk) 14:31, 11 March 2009 (UTC)

Thanks for catching that. It was inserted by this incompetent edit by User:Quota on 19 February 2008. I'm countermanding it now. I've also warned him/her to not edit topics on subjects in which he/she is incompetent. --Coolcaesar (talk) 15:51, 11 March 2009 (UTC)

(Aside .. for some reason the edit does not show up in the history, but does show up in the referenced link. Thanks for including that.)
My edit was:
A markup language is a set of annotations to text that describe how it is to be structured, laid out, or formatted. Markup languages have been in use for centuries, and in recent years have also been used in computer typesetting amd word-processing systems.
and I stand by that as being correct. Markup languages have been used for almost long as there has been printing: copyeditors and binders have marked up drafts with markup long before there were computers (the term is used in Glaister's 1960 where it clearly refers to long-standing practice, and the OED has 1973 A. Davis Graphics iii. 74 The type mark-up by which the designer conveys his instructions to the compositor is one of the essential tools of the trade -- also referring to long-standing practice, and nothing to do with computers).
The error here is in the history section. Please correct (I've made a start) accordingly! quota (talk) 17:05, 11 March 2009 (UTC)
Um, you clearly don't know what a markup language is, let alone markup. A markup language consists of special character inline syntax that embeds a data structure into a linear string of text. The name is borrowed from the concept of markup (in the sense of marking up hard copy) because both are related to facilitating document workflow, but markup languages for computers are clearly distinct from markup traditionally performed by editors. Although both share the underlying purpose of communicating structure and/or formatting and reducing errors, the workflow for using markup languages is fundamentally different than the workflow for using traditional markup for editing. Markup languages are inserted inline (into the text string itself) rather than above or below or in the margins as with traditional markup.
The point I'm making here is that you should never get into a debate when you don't even understand the concepts involved. I learned that the hard way when I was 12; I tried to debate broadband communications policy with strangers when I really didn't understand broadband very well. --Coolcaesar (talk) 07:31, 13 March 2009 (UTC)
Coolceaser, it is you that is out of your depth. Checkout the meaning of the word 'language'. What you are describing is just one form of markup language (as in computing, for example). The markup language used by editors (long before computers were used for this task) comprised various symbols in or alongside the text being marked up. That is a language, just as English (spoken or written) was a language before computers came along. Whether the markup is inline or not is irrelevant – if you look at any SGML or XML markup, formatted for reading, you will notice that most of it is between (above or below) lines containing text. I will see if I can clarify the introductory sentence so it is more acceptable. Please don't blindly revert carefully worded changes which have been made for good reason. Thanks. quota (talk) 13:22, 13 March 2009 (UTC)
Well, it's clear you have no training in historiography of science and technology, which I studied as an undergraduate with what was then the most highly ranked history department in the United States (it's now No. 2 behind Harvard). You're retroactively imposing the contemporary concept of markup language upon a different but related concept, markup. If you tried to do that in a history course in any of the top history departments in the U.S. (or the UK for that matter), you would earn a C at best, or simply fail the course. Writing Whig history is generally considered to be the mark of an amateur or incompetent. There is no need for that kind of original research on Wikipedia.
For one thing, the phrase markup language would have sounded strange to anyone prior to 1970 (because before GML came out with IBM DCF, the only person talking widely about this concept was Bill Tunnicliffe, who spoke of "generic coding"). The term in use was "markup," as in, "I'm going to hand off this ad to our markup man for markup."
Second, it's clear that you don't understand how text strings work or how they are parsed by a computer program, probably because you're not a programmer. Regardless of how a text string is formatted, the point is that it is still a continuous stream of data as far as the computer is concerned; the formatting characters like CF/LR are merely for humans' convenience. Markup language tags are clearly permanently merged into the string they control. In contrast, markup notations on documents drafted by hand on paper are clearly superimposed upon the text, temporarily, and don't merge into it.
You might want to read Charles Goldfarb's authoritative book on the subject sometime (I have). --Coolcaesar (talk) 19:12, 15 March 2009 (UTC)
First off, there needs to be a clear definition or no one can speak with certainty on how many years the process has existed. The introduction fails badly by using highly technical terms which the layperson is ill-prepared to understand.
We learn from Webopedia that "Markup languages are designed for the processing, definition and presentation of text. The language specifies code for formatting, both the layout and style, within a text file. The code used to specify the formatting are called tags. HTML is a an example of a widely known and used markup language."
If that is correct, then use wording something like that. If it is not correct, then use wording something like that.
Besides the overall bad definition, the article is flawed in that it failed to state whether or not it referred to only text formatting by computer languages, or also included that by human languages in a manual application. If it is a general formatting methodology, then any spoken human explanation of the proper precedures would be a form of a markup language. This article needs a rewrite that gets to the point in a much clearer manner. - KitchM (talk) 06:01, 31 August 2009 (UTC)

Why dumb down this subject by omitting its history?[edit]

It's circular reasoning to invoke definitions of computer-related markup languages to "prove" that other text markup systems don't qualify. Handwritten markup may not be a computer language, but that doesn't mean it isn't language. Excluding it from the article is historically inaccurate. William Tunnicliffe and the IBM GenCode project did not invent the idea of a separate system for annotating the structure of a text and giving instructions on how to display it. Editors, typesetters, and proofreaders had been doing exactly that for a century or more. Early computers and computerized typesetting systems couldn't deal with the complexities of that system, so they had to have their own markup languages written that adapted it to their use. They've grown considerably since then, which I'm sure is very laudable of them, but that subsequent development doesn't remove handwritten markup from their ancestry, and ought not exclude it from the category in general.

Other reasons to include handwritten markup: (1.) It's real. (2.) It's still in robust use. (3.) Users who need to know about it are going to try to look it up. (4.) Currently, Wikipedia is leaving terms like "stet" and "to come" in limbo. It notes that "stet" should be written near the words that are to be preserved unchanged, and that they should also have a row of dots drawn under them, but it omits the context in which this occurs.

In short, excluding it is an unnecessary oversimplification that makes this entry less useful and informative than it could be. —Preceding unsigned comment added by (talkcontribs) 13 April 2010

I would like to see some reliable sources saying that the markup in manuscripts was a "markup language", and explaining its relationship to computer markup languages. As far as I know, the only formal definitions of markup languages are those of computer languages, and the term "markup language" didn't really exist before it was invented for computers. Please prove me wrong with some sources. --Enric Naval (talk) 14:00, 27 April 2010 (UTC)
I strongly concur with Enric Naval. The anonymous user at clearly does not understand handwritten markup or markup languages. Personally, I have extensive experience with both, as an attorney (handwritten markup is taught in American law schools) and as a former Web designer. This is a simple example of a name ("markup") being transferred and borrowed ("markup language") because of a vague similarity. It is silly to read anything more into that. --Coolcaesar (talk) 18:41, 27 April 2010 (UTC)
We need an article on handwritten markup. --Enric Naval (talk) 14:06, 28 April 2010 (UTC)
Certainly typsetters markup is a "markup language" - indent, subscript, superscript, italics, change font, change size or spacing, etc. Peter Flass (talk) 15:25, 26 January 2012 (UTC)

POD missing[edit]

I noticed that Plain Old Documentation is missing. Since I'm not a native English speaker, I hope someone else could add it. —Preceding unsigned comment added by (talk) 12:40, 25 November 2008 (UTC)

older entries[edit]

The markup language we use for editing articles should be mentioned, but I don't know a name for it. --Error 23:55, 15 Feb 2004 (UTC)

"wikitext" --Maian 10:04, 9 October 2005 (UTC)

Markup language should discuss the difference between markup languages and programming languages. (Heh, I don't want to tackle that one alone, any takers?). I suggest the following comparison, though it might not be good enough:

Main Similarities:

  • List of variable names and contents. (i.e. <myvariable>hello world</myvariable> vs. myvariable = "hello world")
  • Input can be interpreted and perform a function. (i.e. vs. PRINT) It does it differently, in a markup language like HTML, 'variable names' are interpreted as functions by the browser, and in interpreted programming they're just functions that you type in, but the effect is the same.
  • there are "two views" -- the "source" view and the end-user's view (is there a better term for this? the "rendered view" ? the "runtime view" ?)
  • the "source view" of both a program and a marked-up document looks like plain text.

Main Differences:

  • Cannot modify variables within after it has been input.
  • No discrete mathematics.
  • No flow control:
    • No recursion.
    • No loops
    • No subroutines
  • No OO

That's all I can think of so far--Ben 09:30, 18 Jun 2004 (UTC) slightly modified by --DavidCary 20:23, 6 January 2006 (UTC)

presentational vs. procedural[edit]

I'm not clear on the difference between presentational and procedural markup. For example, the article says that B is procedural, but I've never heard it classified it that way - I consider it presentational. What makes PRE presentational and B not? --Maian 09:42, 9 October 2005 (UTC)

presentational vs. procedural vs ...[edit]

Although "presentational" is not as common as "descriptive" and "procedural", it nevertheless has been defined and used in several ways over the years. Among them:

(i) rendering features such as "several newlines and spaces, thus accomplishing leading space and centering" (the definition text in this Wikipedia article);
(ii) the procedural markup that specifies such features
(iii) the descriptive markup that asserts such features exist in a particular rendition being described.
(And I think we can go after another sense, general declarative, as well, but I won't try that here).

It is true that today the use of "presentational markup" in sense (ii) is often heard, and so Maian's query is not so surprising. This article, like the Coombs, Renear, DeRose (CACM 1987) paper it cites, defines and then consistently uses "presentational markup" in the first sense (renditional features). There is just one slip I think, where it says: "The "i" instruction is an example of presentational markup. It specifies the exact appearance...". In the sense of "presentational markup" being used the HTML elements "i" and "b" are not presentational markup -- they are both procedural markup specifying presentational markup. There might at first glance appear to be another slip. Maian asks " What makes PRE presentational and B not?" Maian is I believe alluding to the article's statement that "...HTML also includes the PRE element, which encloses areas of presentational markup to be laid out exactly as typed." But a close reading confirms that the article is not asserting that PRE itself is presentational, only that it encloses markup that is presentational.

Given that "presentational markup" is now often used in the second sense as well as the first, what is best in this article? I prefer, strongly, the use in the current article; that is: sense (i), the renditional features themselves. For one thing we need that concept, and we have an established name for it, and published defended defintions, and literary warrant elsewhere in the literature, so let's stick with it. And the more common and very well-established term "procedural markup" works just fine for most of the uses to which "presentational markup" in the (ii) sense would be put.

There are some issues here though. When we wish to contrast "i" with "div" it is most natural to say that "div" is descriptive and "i" is procedural, as we have been taught for 25 years; with those terms thought of as contrasting terms partitioning the space of possibilities at least with respect to symbol based markup systems. I am sure that no one wants to replace the word "procedural" with "presentational" in that taxonomy. That would definitely be a bad idea. First because "descriptive/procedural" is still the most common terminology. And second because "procedural" is the classic and original term, promoted by Goldfarb (since who knows when) and an extended discussion of the distinction, expressed in those terms, is included as Annex A.1 of ISO8879, SGML.

Ok, here's the problem. What about procedural markup that does not specify rendition, but rather some other sort of processing (indexing, change notification, etc)? And what about descriptive (declarative) markup that describes rendition -- as when a physical bibliographer describes a title page? If the uneasiness underneath Maian's observation is that some but not all procedural markup is presentational and how should that be accommodated terminologicall ... that remains a problem in markup taxonomy. And the reason there has been pressure in favor of using "presentational" for "procedural" is that although not all procedural markup is presentational(ii), most of what we see around us in electronic publishing is.

Full disclosure: although I just stumbled upon this article so far haven't had a hand in editing it I am in fact a coauthor of the mentioned Coombs, Renear, DeRose CACM (1987) article where sense (i) of "presentational" is defined, so of course I am partisan. In addition I've written on the issue raised above: "The "Descriptive/Procedural distinction is Flawed." in Markup Languages: Theory and Practice, 2:4 2000. [I actually think that some of the (mis)use of "presentational", i.e. use of the term in the (ii) sense, started in the late 80s people hearing about our distinction second hand thought they could guess what we meant by it -- classic folk etymology, or folk semantics]

I'm new to wikipedia so let me know if this talk contribution is inappropriate in any way. -- Allen Renear 20:08, 28 January 2006 (UTC)

markup used in email and Usenet[edit]

Is there a name for the "formatting" common in plain-text Usenet posts and email? My understanding is that most lightweight markup languages are designed to look similar to it. There seem to be many articles that briefly mention it: ( Underline, Italic type, Word attachments, ...)

Is there a single article dedicated to it? (Or should it go into a section of an article -- perhaps in lightweight markup language?) --DavidCary 20:23, 6 January 2006 (UTC)

Proposal to merge Markup (computing) to this article.[edit]

I suggest merging Markup (computing) to this article as it appears to be a small subset. JonHarder 14:52, 27 January 2006 (UTC)

Edits re: proofreading symbols and "fonts."[edit]

I have added some text to the discussion of "markup men" to give the clarifying example of proofreaders' marks as an example of handwritten markup. Also, I deleted the reference to markup including indication of "fonts" during the historical period of "markup men." "Fonts" have only become a synonym for typefaces during the computerized typesetting era. Previously it was a unit of measure used by type foundries selling movable type. A "font" was a "useful" quantity of characters and sorts, and movable type was purchased by the "font." The precise amount of type varied by foundry and within a foundry's product line (e.g., a 10 text font would include far more pieces than 72 point display typeface. See Webster's Third and my longer discussion of this issue on the discussion page for Typeface.


Why does semantic markup redirects to this article ...[edit]

... whereas descriptive markup redirects to nowhere? Just discovered that, while editing Semantic Web Services. There has been long discussions to know if markup languages, and singularly XML, were dealing with syntax only or if they add any semantics to the content. The section in the article about this issue is quite short, and it gives descriptive markup and semantic markup as synonyms. I'm not sure this is consensual and neutral. See e.g. here, and there. See also Semantic publishing. universimmedia 11:03, 6 November 2006 (UTC)

I concur, I was looking for semantic markup and was truly confused about this redirection. The article does not make sense when looking for semantic markup (in the sense of the semantic web). I would expect to find how syntactical and semantic markup differs, whether RDF/OWL is markup or can be used for markup, what other options, microformats, etc. I was searching especially how Wikipedia intends to introduce semantics in the future (Wikitext does not help here). I cannot write this article, but would greatly appreciate an article on this. Vigilius (talk) 22:50, 14 April 2008 (UTC)


I notice that PostScript is included as a markup language. I have never heard it called that: I wonder in what way it meets the definition of markup. There is no underlying stream of text which is intermixed with graphic operators. It is just a programming language with graphics primitives, in this sense is it more of a markup language than, say, a C program with the X-Windows library? Is there any source (citation) for this classification?Notinasnaid 19:29, 3 March 2007 (UTC)

In the absence of any comments, I've removed it. Notinasnaid 11:39, 12 March 2007 (UTC)
I concur with the deletion. PostScript is a PAGE description language! —The preceding unsigned comment was added by Coolcaesar (talkcontribs) 19:11, 12 March 2007 (UTC).

no definition[edit]

In the primary paragraph, the definition of a markup language is not stated. The article tells the reader what it does, but does not state what it is. Just another guy trying to be a Chemical Engineer, Nanobiotechnologist, and Mathematician 08:16, 20 May 2007 (UTC)

citation for "markup men"[edit]

Can anyone provide a citation for "markup men"? A quick check of the web does not reveal any other authoritative sources. 22:00, 14 September 2007 (UTC)


The markup language used by RUNOFF or its descendants (runoff (program), roff, nroff, troff, groff, etc.) probably belongs in the history section somewhere. -- Peter Kaminski (talk) 22:01, 7 January 2008 (UTC)

Job description[edit]

In 1989, physicist Sir Tim Berners-Lee wrote a memo

Although TBL had a degree in Physics, he was working as a software consultant (at CERN) at the time, and not as one of the CERN physicists. (talk) 11:18, 10 December 2013 (UTC)

Corrected error about proofreaders[edit]

I made some edits to the History section to correct an error, which had proofreader's marking up manuscript for another to set the type. I was there. Proofreaders did not mark up manuscripts. They marked up proofs (crude prints of type already set) which they compared to manuscripts while proofreading. Markup of manuscripts was primarily done by "markup men" as now described in the section, with some markup being applied by others to manuscripts such as editors and advertising agency graphic designers. Proofreaders only got in on the act after the type was set and proofed for the proofreaders. Marbux (talk) 07:11, 24 January 2008 (UTC)

But where does 'markup men' come from? I cannot find a reference to that, nor do I recall hearing the term when I was working on hot lead machines. quota (talk) 12:47, 24 January 2008 (UTC)
See Chiarella v. United States, 445 U.S. 222 (1980).
Also, if you search Google Books for "markup man," several references come up.
  • Allan Woods, Modern Newspaper Production (New York: Harper & Row, 1963), 85.
  • Stewart Harral, Profitable Public Relations for Newspapers (Ann Arbor: J.W. Edwards, 1957), 76.
Hope this helps. I am adding these to the article.--Coolcaesar (talk) 02:49, 3 February 2008 (UTC)
OK, I'm convinced :-). Thanks. Odd this wasn't in some of the more 'classic' references like Glaister. quota (talk) 17:00, 4 February 2008 (UTC)
The From the Notebooks of H.J.H & D.H.A. on Composition manual that I received when I started work at Kingsport Press in 1969 refers to the Copy Marker as responsible for the markup of a manuscript for composition (typesetting). The copy marks identified typographic elements such as heads, paragraphs, idented extract, etc. The first markup languages I encountered for typesetting were the RCA GSD PAGE-1 and IBM Composition 360 composition languages and clones and work-alikes like Pica-Ultra, ABCS and P900 (the last a composition program written by Raymond MacDavid and I for Kingsport Press) all appearing before GML in 1973. The Siebold Reports which tracked the tech news in the early history of computerised composition systems or the writings of Frank Romano on the subject ought to document some of these early markup languages. Naaman Brown (talk) 20:44, 5 August 2009 (UTC)
Frank J. Romano, photocomposition and you, Graphic Arts Marketing Associates (GAMA) Publications, 1974, page 57-60, has a chapter on markup in typesetting, circa early 1970s.
For what it is worth, at the Press we had Specification Writers who would look at a manuscript (later, at a word processing file), identify the distinct typesetting elements, and write a Specs Form (a file associated with the job) naming tags for each element and describing them. They often had a proofreading background, and were familiar with the markup symbols referred to by user quota above. Formatters familiar with the typsetting language would translate the Specs Form into a Page Rule file associating the element tag or synonym to the actual composition commands. (We experimented with a program to translate the standardised specs form into the page rule file also.) For example, the Specs Form would name Y4 as the tag for start of paragraph, justified, no more than 2 consecutive hyphenated lines, 20 pica width, 12 point indent, 12 point Time Roman, body lead 13 points. The Page Rule file would define the synonym for Y4 as (SY,Y4) (NL;PS,12;SW,12;TF,68;BL,13;JU,0,240;HY,2;HN,12)(SE) and and the other element tags; a sample page would be marked up showing a (Y4) at start of paragraph by the Specs person or the Formatter, either of whom may have at one time been a Proofreader, altho' not functioning as proofreader when marking up text as Specs Person or Formatter. We actually did not have any designated Markup Men, who were more often women anyway. From my experience in computerized typesetting from 1969 on, there is how it is taught at university, and how it is done in the wild, with every shop taking cue from Frank Sinatra, doing it their way. My last project has been to reduce a file with 1,701 different RTF codes (no doubt from an application written by graduates) to a file with a total of 12 unique tags for distinct type elements. Naaman Brown (talk) 16:15, 19 February 2010 (UTC)

Example Required[edit]

This really needs a good example at the start, a sort of before and after, showing the language instructions and the visible result. Myles325a (talk) 04:27, 16 May 2008 (UTC)


What a clever recipe[edit]

gosh, what an amazingly clever recipe is being featured at the top of the article

Distinction between a markup language and a "container language"?[edit]

This article refers several times to "container languages" and implies that if something is a container language, it cannot also be a markup language. But Wikipedia has no article on "container language" and I don't know what one is, so the references in this article aren't informative to me. Is "container language" a widely used term in computer science? DSatz (talk) 14:25, 14 July 2008 (UTC)

I also am confused by the term "container language". Is it a synonym of digital container format ? --DavidCary (talk) 17:02, 30 July 2013 (UTC)

Thousands of markup languages - how to integrate them here?[edit]

There are many many markup languages out there. A google search on 'Markup languages' in results in about 6200 pages! The markup languages category only covers 184 of them.

I think there are several issues here:

  • The development of numerous markup languages (mostly based on XML) for a wide veriety of applications should be indicated in the body this page.
  • The "Types of Markup languages" bar is incomplete. There are so many markup languages that carefull thought and organisation would be necessary for such a bar to be effective here. In my limited wikipedia experience, I'm not sure how best to do that..
  • The 'markup languages' category and sub-categories could do with an overhaul to better reflect the current state of the game.

That sounds like a huge ammount of work! Drevicko (talk) 00:15, 15 October 2008 (UTC)

There is a list of markup languages page that links to several other 'list of ... markup languages' pages. This at least is a heirarchy, but perhaps not an ideal way to set it up.. Drevicko (talk) 00:18, 15 October 2008 (UTC)

Not an artificial language[edit]

I removed the claim in the first paragraph that markup languages are "artificial languages." I think that's not only quite misleading, it's not substantiated by the article on artificial language to which it referred. While I was there, I also tried to make the first two paragraphs somewhat clearer for a non-technical reader. dweinberger (talk) 21:45, 22 February 2009 (UTC)

Content from the HTML Wiki[edit]

Hi, I just removed this content from the HTML Wiki for reasons explained at the first link. It provides some sources but no citations. --Jesdisciple (talk) 07:48, 17 June 2010 (UTC)

LaTeX seems not to be a direct decsendant of SGML[edit]

The description about SGML says that SGML is a direct ancestor to HTML and LaTeX, but it seems not... I think LaTeX is a typo of XML. What do you think? -- (talk) 01:05, 14 July 2016 (UTC)

This seems to have been adjusted, but is still not right. It says Scribe influenced GML, but this goes against the info in the Scribe page that recounts the famous story of Reid and Goldfarb presenting the same idea in different papers in the same conference. (The difference is that Goldfarb wanted a radical separation between semantics and style.) Rick Jelliffe (talk) 09:23, 20 September 2016 (UTC)

Lack of "tag" page[edit]

In this page, tags play an important role as a general concept, but a link on tag brings to the HTML page as if only HTML had real tags! I realise that there is no page about tags. If the general opinion would be that tag is not worth a page, then a page as this one (markup language) should play the role by having a section explaining tags in general. HTLM or other pages should then refer, about tags, to the tag page or the tag section in the present page, not the other way round. --Dominique Meeùs (talk) 09:50, 29 November 2016 (UTC)