From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Internet (Rated B-class, High-importance)
WikiProject iconThis article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
WikiProject Computing (Rated B-class, High-importance)
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.

Broadcast Markup Language and BeerXML[edit]

The recently created article on BeerXML is up for deletion by one user who is very motivated to get rid of it after their speedy delete marker was removed. As they are now losing the argument at its delete discussion page they have adopted the tactic of trawling Wikipedia for other specialist XML based definitions and placing delete markers on them also. The long-standing article Broadcast Markup Language has now being tagged by for them in an effort to shore up their attempts to get BeerXML deleted.

If this behaviour is not checked several articles in the various XML categories could come under threat. Please help defend these articles with good arguments. Help will be much appreciated by all the contributors who now find their good work under threat. Devils In Skirts! (talk) 12:34, 15 February 2014 (UTC)

Help Oh12345well4321 (talk) 08:28, 23 March 2019 (UTC)

Music Markup Language[edit]

Please consider adding this external link: Music Markup Language. -- Wavelength (talk) 18:28, 14 April 2010 (UTC)

(1) There's already an article on it (Music Markup Language) and internal links are preferred to external links (2) It is already mentioned in the categories and list articles linked in the 'See also' section (3) Why would this particular language merit mentioning? There are tons of XML-based languages, it's not feasible to list every single one on the page; only a very few examples are given, solely in the last paragraph of the introduction, and those ones are wildly popular; there is no indication Music Markup Language is anywhere near as widely used. --Cybercobra (talk) 18:44, 14 April 2010 (UTC)
Thank you for your reply. I did not notice the article on it. I accept your reasoning for not listing it as an external link.
-- Wavelength (talk) 16:47, 15 April 2010 (UTC)

I would like to see more resource to the tutorial resource[edit]

Hello, members of this community. I consider that this article is very informative, professionaled and so on. The great tutorials was used for writing of this article such w3schools. But I would like to see here tutorials which are not so great but very useful for beginers too, such as What do you think you about it? Thank you in advance. —Preceding unsigned comment added by Malinari (talkcontribs) 16:35, 21 April 2010 (UTC)

Wikipedia is not a tutorial: see What Wikipedia is not. That policy statement doesn't directly address the level at which an article should be written, for example whether an article on XML should be written for the general public or for professional programmers. But in my view, this article is pitched at about the right level. There are plenty of other sources if you want a more gentle introduction or (conversely) something addressing the formal computer science audience. Mhkay (talk) 13:26, 22 April 2010 (UTC)

The above commentary notwithstanding, this entry is incredibly obtuse, only slightly more readable than a reference book. I come here every few months lookng for some useful information as to what XML is, forgetting my last abortive attempts to understand it here, the details of what it is made up of, etc, something that I can elucidate my own lacking knowledge of it. Instead I read abstruse cryptic commentary, assumptive descriptions, and ambiguous terminology that presumes the reader knows enough about the topic that it would appear unnecessary to read the entry. I am a programmer from the old school, and an EE so I have had my soirees with technical manuals, but this prose is so dense and undefinitive as to the terms used, I find my mind wandering away from it, not mulling over the information in it. It doesn't have to be a tutorial to be understandable.

text/xml deprecated?[edit]

Not sure why it says text/xml is deprecated.

Just skimming over the RFC can't see that explicitly [1]

Jjjjjjjjjj (talk) 19:20, 10 June 2010 (UTC)

I have updated the citation to the IETF memo that deprecates text/xml and explains why. Mhkay (talk) 22:57, 11 June 2010 (UTC)

The RFC says "If an XML document -- that is, the unprocessed, source XML document -- is readable by casual users, text/xml is preferable to application/xml." I think characterising this as deprecation is inaccurate. Perhaps the description should be "application/xml (preferred for most technical use), text/xml (preferred when readable by casual users)" with a link to the RFC. What do people think? Paul Foxworthy (talk) 03:52, 13 June 2010 (UTC)

The cited Murata/Kohn/Lilley memo clearly labels text/xml as deprecated. This memo is much more recent than RFC 3023. The problems with text/xml largely emerged after 3023 was published. Mhkay (talk) 22:50, 14 June 2010 (UTC)
Thanks. But the citation is to RFC 3023. If the MKL memo is the source that confirms that text/xml is deprecated, then the citation is misleading. Well, it misled me at least :-). The memo I can find [2] is a draft and supposedly expired in March. If it's now an RFC, where is it? I propose a second citation be added to the deprecation referring to the draft. If and when the draft becomes an RFC to replace 3023, there should be just one citation that refers to that replacement. Does that make sense to everyone?Paul Foxworthy (talk) 15:47, 21 June 2010 (UTC)
I can't see where you have problems. (You say "But the citation is to RFC 3023". But there are multiple citations.) The article says that RFC 3023 standardizes text/xml and application/xml, which is true, and it also says that text/xml is in the process of being deprecated, which is also true, and both statements are linked to relevant citations. I've no idea what the current state of that process is, but the fact that the memo has timed out doesn't mean the process has been abandoned, unless you can find evidence to the contrary. Mhkay (talk) 21:39, 21 June 2010 (UTC)
I was talking about the citation in the infobox, sorry I didn't make that clear. I am not too fussed about the status of the memo, all I want is the best citation for the fact that text/xml has been deprecated.Paul Foxworthy (talk) 06:02, 1 July 2010 (UTC)
I've added a citation in the infobox. Paul Foxworthy (talk) 04:52, 6 July 2010 (UTC)
Now it still looks as if it were deprecated, but in reality it isn't. It was - as you tell yourself - deprecated in a draft which expired. It this really notable? Anyway it should be made clear, that it isn't deprecated and may never be deprecated, although there may be reasons against it's use. —h.e.r— (talk) 08:54, 2 August 2010 (UTC)
Claiming deprecation of text/xml in this article is misleading. The Murata/Kohn/Lilley memo merely refers to potential issues with charset encoding values, not with the media type per se (charset encodings can clash when the charset indicates ISO-8859-1 when the XML document uses UTF-8 for example.) The fact is that text/xml is still widely in use as a media type and has not been deprecated. Robert van Engelen (talk) 06:12, 2 August 2018 (UTC)

XML Abuse[edit]

XML being developed for text markup is being used as general serialization container for any data structure.

Should we add a section about XML Abuse?

Or maybe just a reference at the header should be added?

What do you think?

I would like to add a reference to the header since this is an important problem. —Preceding unsigned comment added by (talk) 07:03, 20 June 2010 (UTC)

I think you would find it very hard to get consensus on any statement (let alone one short and pithy enough to go in the article lead) about when XML is and is not appropriate. Certainly, the opinions on the page you cite are far too debateable to go here. Let's keep this article factual and concise. It should tell people what XML is and does, not try to precis all the debates about its whys and wherefores. This is an encyclopedia. Mhkay (talk) 21:44, 21 June 2010 (UTC)
I would like to see mention to the XML abuse as this is an extended practice: What is the worst abuse of XML that you have seen? —Preceding unsigned comment added by (talk) 18:41, 12 December 2010 (UTC)
XML abuse is a serious, real-world problem and as such it should be addressed by the Wikipedia article. Things have cooled off now that the current buzz is about anything with the word "cloud" written over it, but it was quite terrible not many years ago. XML probably was, and still is, the most widely misunderstood and heavily buzzworded technology of this century so far, and acts as a selling point of anything that uses it, regardless of purpose and schema. People (especially pointy-haired bosses) think anything which has "XML" in the box will automagically talk to anything else with the same label. 08:55, 13 January 2011 (UTC)

A very interesting insight on two potential examples of XML abuse can be quoted from Håkon Wium Lie, Opera's CTO (e.g. here): he describes OOXML and ODF as essentially "memory dumps with angle brackets". 08:55, 13 January 2011 (UTC)

08:55, 13 January 2011 (UTC)  —Preceding unsigned comment added by (talk)  

Thank you very much for keeping the Criticism section and making clear that XML should not be used to represent structured data, but narrative documents. — Preceding unsigned comment added by (talk) 08:09, 13 October 2011 (UTC)

XML (Extensible Markup Language) is a set of rules for encoding documents in machine-readable form.[edit]

Would it not be more appropriate to say that xml is for encoding in human-readable form?

What is machine-readable form supposed to mean?

UndercoverAgents (talk) 18:52, 7 July 2010 (UTC)

Its sole purpose is to interpret human-readable content/context and to turn it into machine-readable from which through a medium/interface. XML can be read/interpreted and parsed from within compiled and ascii, which suggests both application and text would be valid (personal opinion). Daemondevel (talk) 00:57, 4 August 2010 (UTC)

Spelled-out title[edit]

The W3C defines XML as follows:

Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents ... and so on

The important part here is that the article should first use the fully spelled out name and then the abbreviation. This is not only in accordance with the standard, but also follows general rules of good writing in English. Kbrose (talk) 03:15, 4 August 2010 (UTC)

This might be true if there were consensus that "XML" is an abbreviation of the three-word form, but many of us just don't believe that. Tim Bray (talk) 05:39, 4 August 2010 (UTC)

"Extensible Markup Language" first?[edit]

We're going to need to sort out what it should say at the top of the article:

Current candidates:

Extensible Markup Language (XML) and XML (Extensible Markup Language)

The first is supported by the English convention that a full name is listed first, and wording from the W3C spec: "The Extensible Markup Language (XML) is..." The second by the fact that the title of the article is (appropriately) XML and since the three-letter version is used rather than the three-word version in approximately 100% of spoken and written discourse.

Also note that XML is *not* an abbreviation or an acronym, it is just another name for the same thing.

My vote would be that the primary name should be the same as the title of the article and should reflect common usage. But it's not a matter of life or death. What do others think? Tim Bray (talk) 03:19, 4 August 2010 (UTC)

Of course it's an abbreviation, even the standards documents specifically say so, as quoted above. The title of the article should also be the full name, ideally. The reason that it isn't, is that too many writers here are suffering from Acronymitis. Almost all articles of computer networking protocols use the full protocol name as title, even for the most common of protocols, such as IP. Kbrose (talk) 03:27, 4 August 2010 (UTC)
There's no need for the title and the first name mentioned in the lede to match, particularly for acronyms where the full name is less common: NATO, Laser; see WP:SINGULAR on acronyms. --Cybercobra (talk) 03:56, 4 August 2010 (UTC)
The form in the NATO entry looks better than either alternative to me. "Extensible Markup Language or XML" correctly reflects that one is not an abbreviation of the other. I'd put the more common form first, but that's hard to get too excited about. Tim Bray (talk) 05:42, 4 August 2010 (UTC)
Empirically, the first letter of "Extensible" is not "X". It is generally agreed upon that writing "eXtensible Markup Language" is an error, which is another symptom of the fact that "XML" and "Extensible Markup Language" are two names for the same thing, one immensely more popular and widely used than the other. When I drafted the first sentence of the XML specification, I was insufficiently percipient to have predicted which would catch on. Tim Bray (talk) 05:38, 4 August 2010 (UTC)
XML is definitely an acronym, as evidenced by the fact that "ML" is "Markup Language" and "X" is generally considered an acceptable character abbreviation to represent extensible, at least in part because extensible and xtensible share the exact same phonetics. For other examples see XP (eXtreme Programming, eXperience Point), XSL (eXtensible Stylesheet Language), XBML (eXtended Business Modeling Language, eXtensible Battle Management Language), XMP (eXtensible Metadata Platform), and so on. The oXygen editor's product name is a play on the "X" acronym use, so with many counter examples, I would argue that it's not generally agreed that eXtensible is incorrect. It may not be a well-formed acronym, but it does have the most important semantic mapping characteristic of an acronym and in the one case that it doesn't take the first letter mapping, it uses an acceptable replacement. That's the first point, so if I replace XML with DNA, does the second point hold up?
"The second by the fact that the title of the article is (appropriately) DNA and since the three-letter version is used rather than the three-word version in approximately 100% of spoken and written discourse."
In this case, it's obvious that the typical rules of English apply, even though most people probably don't even know what DNA stands for any more. I can definitely see (and agree with) the logic of mapping from the commonly seen and heard acronym back to the expanded form when the acronym serves as a mental key, but that's inconsistent with currently correct english usage. It's essentially guaranteed that acronyms are always going to be more popular and more widely used than their expansions because that is their very purpose, so your argument would apply to all acronyms. MaxxD (talk) 07:10, 4 August 2010 (UTC)
(Okay, I'll argue with myself...) XML is technically an initialism, not an acronym or abbreviation because it is not a pronounceable word, but the point that it does represent the initials of the expanded form (notwithstanding the ex/x issue) is reasonable. However, The Chicago Manual of Style (CMS) states, "Occasionally, too, it makes sense to use the acronym first and put the full name in parentheses, if the acronym in question is so familiar to your expected audience that it almost goes without explication." [3] and XML has certainly achieved this distinction, so writing it as "XML (Extensible Markup Language)" is not only perfectly okay per the CMS, but almost certainly preferred. MaxxD (talk) 09:36, 4 August 2010 (UTC)

Details of valid characters[edit]

The section "details of valid characters" is getting absurdly detailed, especially as it appears so close to the start of the article. It's simply not interesting to the average reader who comes here wanting an overview of what XML is - the kind of people who want this level of detail are much more likely to go to the specs than to come here. I think the usual Wikipedia solution is to move the material out to a separate article, and I propose doing that. Mhkay (talk) 11:06, 13 August 2010 (UTC) (Now done.)

By definition?[edit]

Under "key terminology" it is stated: "By definition, an XML document is a string of characters.". By what definition, pray? That's not what the definition of "document" in the XML 1.0 rec says. It might be nice if it did, but it doesn't. Instead it mumbles about "textual objects", thus leaving (deliberately?) ambiguous the question of whether a document is a sequence of characters or a sequence of octets. Mhkay (talk) 23:23, 25 August 2010 (UTC)

This description is only useful to people who already know what XML is useful for[edit]


it would be helpful if someone re-wrote this to explain why XML exists, as this would justify the entry. —Preceding unsigned comment added by (talk) 14:49, 8 September 2010 (UTC)

"&" and "<" in XML entity values[edit]

  • The article itself states they "may never appear in content."
  • The matching reference's summary states they are allowed (just not recommended).
  • The actual reference (i.e. the specs) states they "MUST NOT appear in their literal form, except when..." (certain cases like when inside CDATA).

So should the first two be fixed to reflect the latter? Can someone offer correct fixes then? - (talk) 08:52, 16 November 2010 (UTC)

Example shown in Icon, not technically invalid but a poor example[edit]

Looking at the example, it shows questions and answers being thrown straight into the <quiz> bracket. Surely each pair of Q&A would need to be wrapped in a tag <round> or <question_set>? Otherwise the program using this would have to read through the whole thing serially for any of it to make sense.

This is more a practical issue and not a technical one.-- (talk) 14:12, 3 March 2011 (UTC)

Using XML for question-and-answer quizzes seems to be a common student exercise set by unimaginative teachers, and as the problem never occurs in real life I guess you'd better find out what those teachers consider the right answer to be. Or at any rate, find out what requirements they are assessing the solution against. Mhkay (talk) 15:15, 3 March 2011 (UTC)

Still a poor example, and barely an example as it is shown as a small piece of graphics. There should definitely be a real example in the text. And in that example it should be explained which one is the root element. My issue is that I believe I heard that the root element is in fact an implicit element above the topmost element. This article does not even explain what a root element is, just that there is only one. (talk) 12:15, 13 January 2012 (UTC)

History needs a bit of cleanup[edit]

Fixed a couple of things, but the section could (and should) be much better written. A very short to-do list, in decreasing order of importance:

  • the link to Kimber's blog is totally out of place;
  • more supporting citations are needed;
  • more historical sources should be found and linked to.

Andy Monakov (talk) 11:32, 15 September 2011 (UTC)

On the number-of-weeks issue, I can't get Jon's count to work in my head. I seem to remember that we were working in at least part of August, and when I pop up a 1996 calendar I have trouble getting the week count down to his number. However, it is absolutely the case that the first wave of work was in the August-November timeframe, so I thought it best just to say that rather than arguing over the number of weeks. On the section in general, I agree it's rambling and messy, that may have been partially a consequence of too many of the people who were involved wanting their opinion/contribution included. I think it would be a good idea for someone else to be bold and clean it up. Tim Bray (talk) 17:35, 20 September 2011 (UTC)


Is it worth including a link to Jsonix? --Gak (talk) 12:22, 21 September 2011 (UTC)

A bit of Web searching reveals no uptake, and also is offline. So, no. Tim Bray (talk) 23:04, 30 September 2011 (UTC)

Large commented out section under Well-formedness and error-handling[edit]

The section in question can be found at the end of this section:

Are there plans to use that? If not it should be removed, although it does seem to contain some valid information. — Preceding unsigned comment added by Nick Garvey (talkcontribs) 04:03, 2 November 2011 (UTC)

Hidden commented out sections like this are a menace. I've moved it here from the article:
Extended content

Tree representation of an XML Document

The nesting of elements leads directly to a tree representation for an XML document. The root element becomes the root of a tree. Because every element is composed of a sequence of other elements and character data, it is easy to determine the children of each element. Just take each item in the sequence and create a new child node. Here is an example of a structured XML document:

 <recipe name="bread" prep_time="5 mins" cook_time="3 hours">
   <title>Basic bread</title>
   <ingredient amount="8" unit="dL">Flour</ingredient>
   <ingredient amount="10" unit="grams">Yeast</ingredient>
   <ingredient amount="4" unit="dL" state="warm">Water</ingredient>
   <ingredient amount="1" unit="teaspoon">Salt</ingredient>
     <step>Mix all ingredients together.</step>
     <step>Knead thoroughly.</step>
     <step>Cover with a cloth, and leave for one hour in warm room.</step>
     <step>Knead again.</step>
     <step>Place in a bread baking tin.</step>
     <step>Cover with a cloth, and leave for one hour in warm room.</step>
     <step>Bake in the oven at 180(degrees)C for 30 minutes.</step>
!-- Not well-formed fragment --
<title>Book on Logic<author>Aristotle</title></author>

One way of writing the same information in a way which could be incorporated into a well-formed XML document is as follows:

!-- Well-formed XML fragment --
<title>Book on Logic</title> <author>Aristotle</author>

In XML, the proper way of nesting code is through parallel data and character data


	Hello, my name is<first-name>John</first-name>
	<last-name> Doe</last-name>from the
	<country>United States</country>

This shows the “paragraph” consists of a sequence of five items. The “first-name”, “last-name”, and “country” elements consisted of character data and the other two areas were just character data.

Entity references[edit]

An entity in XML is a named body of data, usually text. Entities are often used to represent single characters that cannot easily be entered on the keyboard; they are also used to represent pieces of standard ("boilerplate") text that occur in many documents, especially if there is a need to allow such text to be changed in one place only.

Special characters can be represented either using entity references, or by means of numeric character references. An example of a numeric character reference is "&#x20AC;", which refers to the Euro symbol by means of its Unicode codepoint in hexadecimal.

An entity reference is a placeholder that represents that entity. It consists of the entity's name preceded by an ampersand ("&") and followed by a semicolon (";"). XML has five predeclared entities:

  • &amp; (& or "ampersand")
  • &lt; (< or "less than")
  • &gt; (> or "greater than")
  • &apos; (' or "apostrophe")
  • &quot; (" or "quotation mark")

Here is an example using a predeclared XML entity to represent the ampersand in the name "AT&T":


Additional entities (beyond the predefined ones) can be declared in the document's Document Type Definition (DTD). A basic example of doing so in a minimal internal DTD follows. Declared entities can describe single characters or pieces of text, and can reference each other.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE example [
    <!ENTITY copy "&#xA9;">
    <!ENTITY copyright-notice "Copyright &copy; 2009, XYZ Enterprises">

When viewed in a suitable browser, the XML document above appears as:

Copyright © 2009, XYZ Enterprises

Numeric character references[edit]

Numeric character references look like entity references, but instead of a name, they contain the "#" character followed by a number. The number (in decimal or "x"-prefixed hexadecimal) represents a Unicode code point. Unlike entity references, they are neither predeclared nor do they need to be declared in the document's DTD. They have typically been used to represent characters that are not easily encodable, such as an Arabic character in a document produced on a European computer. The ampersand in the "AT&T" example could also be escaped like this (decimal 38 and hexadecimal 26 both represent the Unicode code point for the "&" character):


Similarly, in the previous example, notice that "&#xA9;" is used to generate the “©” symbol.

See also numeric character references.

Well-formed documents[edit]

In XML, a well-formed document must conform to the following rules, among others:

  • Non-empty elements are delimited by both a start-tag and an end-tag.
  • Empty elements may be marked with an empty-element (self-closing) tag, such as <IAmEmpty />. This is equal to <IAmEmpty></IAmEmpty>.
  • All attribute values are quoted with either single (') or double (") quotes. Single quotes close a single quote and double quotes close a double quote.[1][2]
  • To include a double quote inside an attribute value that is double quoted, or a single quote inside an attribute value that is single quoted, escape the inner quote mark using entity references.
  • Tags may be nested but must not overlap. Each non-root element must be completely contained in another element.
  • The document complies with its declared character encoding. The encoding may be declared or implied externally, such as in "Content-Type" headers when a document is transported via HTTP, or internally, using explicit markup at the very beginning of the document. When no such declaration exists, a Unicode encoding is assumed, as defined by a Unicode Byte Order Mark before the document's first character. If the mark does not exist, UTF-8 encoding is assumed.

Element names are case-sensitive. For example, the following is a well-formed matching pair:

<Step> ... </Step>

whereas these are not

<Step> ... </step>
<STEP> ... </step>

By carefully choosing the names of the XML elements one may convey the meaning of the data in the markup. This increases human readability while retaining the rigor needed for software parsing.

Choosing meaningful names implies the semantics of elements and attributes to a human reader without reference to external documentation. However, this can lead to verbosity, which complicates authoring and increases file size.

Automatic verification[edit]

It is relatively simple to verify that a document is well-formed or validated XML, because the rules of well-formedness and validation of XML are designed for portability of tools. The idea is that any tool designed to work with XML files will be able to work with XML files written in any XML language (or XML application). Here are some examples of ways to verify XML documents:

  • load it into an XML-capable browser, such as Firefox or Internet Explorer
  • use a tool like xmlwf (usually bundled with expat)
  • parse the document, for instance in Ruby:
 irb> require "rexml/document"
 irb> include REXML
 irb> doc ="test.xml")).root
--Cybercobra (talk) 05:12, 2 November 2011 (UTC)

Character entity references for escaping[edit]

I attempted to link the #Escaping section with the Character entity reference article. I thought the link was relevant because it seems that article also lists the same five objects, and could potentially expand on the topic. If the problem with my change was just an issue with terminology or semantics, perhaps I can avoid this by directly naming the article, for example “There are five predefined entities (see Character entity reference)”. Otherwise, I’d love to know why the two topics shouldn’t be related when they seem so similar. Vadmium (talk, contribs) 12:34, 5 February 2012 (UTC).

Inline links are preferable - so I linked the "predefined entities" to the list article. I would be in favour of a link to Character entity reference in there as well (as a see also or piped link), as I'm not sure how Tim's more strict syntactic stance fits with article topics. I've linked to them in the (already linked) Valid characters in XML as a compromise for now. I'm more than happy to step away and leave for Tim. Is there a better title for List of XML and HTML character entity references ? (discuss here) Widefox (talk) 10:09, 13 June 2012 (UTC)

Linking “predefined entities” is fine by me. I would like to see other opinions, especially Tim’s, because so far I don’t understand the reasons behind his edit summaries. Vadmium (talk, contribs) 13:00, 13 June 2012 (UTC).

Merger proposal: valid XML document[edit]

The concept of a valid document is better described in the XML article itself. The phrases "valid XML" and "valid XML document" should redirect to that section in the XML article. Paul Foxworthy (talk) 04:30, 3 July 2009 (UTC)

After three years of no discussion, with a very short article, the little content involved has been merged to this article. WTF? (talk) 03:18, 12 July 2012 (UTC)

Not sure what is, found anyway[edit]

Not from INRIA anyhoo. — Preceding unsigned comment added by (talk) 01:22, 27 September 2012 (UTC)


  1. ^ "XML Attributes". W3Schools.
  2. ^ "Attributes (XML Standards)". Microsoft.


I came across this page because I wanted to know what and XML document was, and what it's used for after I came across one on my computer. The article confused me and although it appears to contain a considerable amount of information for those with an advanced level of understanding about these things, as it started off expecting that the reader had the knowledge I gave up and will probably go elsewhere.

If the information beginners might need is already in the article, I would suggest moving it around so it appears early on. If the information is in another article, it might be best to make this known at the top of the page.

It might just be me having this problem because I can't imagine many people want to know what an XML document is, but it's worth bearing in mind.

Brainshower (talk) 16:19, 25 October 2012 (UTC)

Please respond to this by looking at the article and improving it. This is just one of too many Wik articles (especially ones dealing with computers) that leave outsiders frustrated.

Kdammers (talk) 11:56, 15 November 2012 (UTC)

    • Added XML log section as this is what is found on most systems. Thank you for the feedback. Telecine Guy (talk) 20:22, 14 December 2016 (UTC)


"Extensible Markup Language (XML) is a markup language created to.... As of 2009, hundreds of XML-based languages have been developed." So, is XML ONE language that inspired others that are based on it, or is it a family of languages? The text says the former, but it comes close to saying the latter. I think a clarification is needed.Kdammers (talk) 11:59, 15 November 2012 (UTC)

The airplane was created by the Wright brothers to... As of 2009 hundreds of airplanes have been developed.
The PC (Personal Computer) was created by John Blankenbaker in 1971. He called it the Kenbak-1. As of 2009 thousands of PCs have been developed. --Guy Macon (talk) 14:59, 15 November 2012 (UTC)

AJAX, Syntax Examples[edit]

I searched the article for any mention of AJAX and didn't see any. Maybe I'm mistaken? I believe it's important to shed some light on the fact that XML has one of the key ingredients of the highly-relevant and evermore popular AJAX framework. While there is a whole article dedicated to AJAX itself, it would be a disservice to XML to not give it some limelight with AJAX' popularity. I would also recommend adding AJAX to the list of recommendations at the bottom of the page. Secondly, the syntax examples given are great, but I feel it would be beneficial to show some syntactic examples, perhaps with a wider context. For example, displaying a decent-sized chunk of XML code in it's full glory and I would say that even showing an example XSD along side of it would be very much relevant to XML as well. Great article though overall! Very good work. Cheers! Danielbullis (talk) 04:13, 2 March 2013 (UTC)

Here's a good example of what I meant by providing XML examples. Scroll down to approximately half-way down the page or so, or use the table of contents to select the "XML" link. XML example in JSON article. Cheers! Danielbullis (talk) 05:46, 2 March 2013 (UTC)


This article lacks a description of XMLs ability to define tables (e.g. with definable cell walls as used in Microsoft Word etc and predefined table types as used in Z-notation). It would be nice to see examples of XML used to define tables. FreeFlow99 (talk) 10:40, 22 January 2014 (UTC)

XML (as scoped by this article) doesn't have the ability to define tables. XML, together with an application schema, might be able to – but that belongs in an article about that schema, not about the syntax and data model of XML overall. At most, this article could use detail about such an application level as an example of what XML can be used for and so why it's worth bothering with it at all. Such examples should be broad and lightweight though. Also note that the importance of XML to table-schema doesn't necessarily indicate that table-schema has an equal commutative relationship of importance to XML. Andy Dingley (talk) 11:46, 22 January 2014 (UTC)
Thanks. In that case I would suggest the creation of a small section entitled "Tables" (so that it is easily found by people interested in that functionality). Within that section a statement that XML itself does to support tables, but that some markup languages based on XML do, and list some examples with links to their respective pages. FreeFlow99 (talk) 12:47, 22 January 2014 (UTC)
Totally inappropriate suggestion. What's so special about tables? They are just one example of the many things that can be modelled using XML. It would be like mentioning Timbuktu in an article about helicopters, just because helicopters can be used to travel to Timbuktu. Mhkay (talk) 00:05, 25 January 2014 (UTC)


Strangely enough, I did not found any examples on this page.

    • Added, Thank you for the feedback. Telecine Guy (talk) 20:32, 14 December 2016 (UTC)

Unsourced claims of extensibility and semantics.[edit]

These [4] [5] are a problem. Not only are they unsourced, but XML can't do these things. Yes, they can be done, and they can be done in XML - but they can be done in ASCII too, and I don't see such claims being added to that article. XML does not support these features. It is wrong for an encyclopedic article to add unsourced claims like this, that imply that it does. Andy Dingley (talk) 15:47, 24 August 2017 (UTC)

  1. [6] " through use of tags that can be created and defined by users. Much like natural language is extensible (that is, can grow) when speakers create new words and agree on what they mean, XML is a markup language that can grow when users create new elements and agree on what they mean. "
  2. [7] "XML allows markup with tags that can be created and defined by users. Much like natural language is extensible (that is, can grow) when speakers create new words and agree on what they mean, XML is a markup language that can grow when users create new elements and agree on what they mean. This makes XML able to capture intent in a way much broader than a nonextensible markup language such as HTML. For example, XML can mark up machine-readably that apples and bananas are types of fruit, which is semantically deeper than the purpose of HTML. However, HTML is useful for display of content; often HTML is used to display XML content after transformation with XSL."
Andy Dingley (talk) 15:48, 24 August 2017 (UTC)

Problems with this:
  • "use of tags that can be created and defined by users."
How are "users" (and who are "users"?) able to do this? This is an ancient misconception of XML, one that I thought was dead and buried by 2005 or so - 2000 if you'd been paying attention. XML differs from SGML primarily in that it is parseable without a DTD or schema, but the misconception is that trivial parsing then magically allowed the resultant infoset to be processed further, in ignorance of the schema. This bold claim did not work.
  • "Much like natural language is extensible (that is, can grow)" adjoining "XML is a markup language that can grow when users create new elements and agree on what they mean. "
That's to conflate a folksonomy and the use of XML, presumably by creating new elements similarly on the fly. Ain't gonna happen.
  • "This makes XML able to capture intent in a way much broader than a nonextensible markup language such as HTML. "
There is no indication that XML can do any such thing. In both cases, a schema needs to agreed beforehand, so that programmers can begin work on writing processors for this infoset. If either is extensible, they're both extended in the same way: by using a (pre-defined and pre-agreed) metadata schema (there are several for HTML, such as Microformats), pushing the problem off into a metaformat.
  • "XML can mark up machine-readably that apples and bananas are types of fruit,"
Wow, class-based taxonomic inferencing just appeared out of nowhere. Not in XML it doesn't. Maybe in RDF, but even then you start to need OWL to get anywhere.
These are serious problems in this added text, and these misunderstandings were demolished (at vast cost) about 15 years ago. WP should not be reintroducing them today. Andy Dingley (talk) 16:01, 24 August 2017 (UTC)

Hi @Andy Dingley:. I too am happy to discuss. You're (good-faith-ly) quite off base about "XML does not support these features". It's the very point of its extensibility and it's why companies or anyone else writes their own DTDs and XML schemas instead of pulling a ready-made one off the rack. Besides the voluminous discussion of technicalities and syntax (which is already abundant and is quite useful as far as it goes), the article also needs to simply explain to readers why extensible markup is a thing—why it is needed in addition to nonextensible markup such as HTML. There is no reason not to begin the Applications section with this explanation. Think about why companies write their own DTDs and XML schemas. Why didn't WC3 just create all the XML elements that can exist in a predefined dictionary, like they did for HTML? It is to capture meaning through markup. For example, a company needs to define a new child element for an existing parent element. This is why XML databases are possible to make. Quercus solaris (talk) 15:57, 24 August 2017 (UTC)

  • "Think about why companies write their own DTDs and XML schemas."
Because they don't know any better. And if they do it today, it's because they've not been paying attention for years.
If you want your project to work, don't invent a schema. A new schema isn't some magic way to communicate, it's a babel to prevent anyone else understanding what you're saying. Successful use of XML isn't based on inventing new schemas (unless you have some industry-dominant position in your narrow industry), it's about using existing schemas. If I had a dollar for every project that crashed and burned from that mistake, I'd have most of my consultancy income for 2000-2005. If you really need to do this, don't expect XML to do it for you, you need something better.
  • "This is why XML databases are possible to make"
What's an "XML database"? A marketing buzzword? (most popular) An opaque bucket to store opaque XML in, rather than opaque text? (how WP defines it) Or a database that allows querying on the basis of the XML infoset? - doesn't work. That's why SPARQL is based on RDF rather than XML.
All of this extensibility is good stuff, but XML doesn't do it. That's why we had to invent other things to do it with. Andy Dingley (talk) 16:14, 24 August 2017 (UTC)

No, no—you're way overthinking the technicality of what I'm saying, and being way too superficially dismissive ("not paying attention since 2000" etc). I'm not talking about folksonomy at all here. Not about "on the fly" at all—rather, about "at all" versus "on the fly". "Users" in the XML context are the people writing the content or the people marking up the content on behalf of them (such as when a nurse writes an article about nursing and then a journal staff marks it up with XML). The journal staff create elements in an XML schema. Then, using that schema, they all agree on what the elements are, how they interact with other elements, and what they represent semantically. For example, a book author decides that his book will have two types of appendix. The book publisher then creates elements such as appendix1 and appendix2, defines them in a DTD or schema, and uses them in XLS. The point is that HTML does not offer a way to define appendix1 and appendix2; it is not for that purpose. XML is. Quercus solaris (talk) 16:09, 24 August 2017 (UTC)
"you're way overthinking the technicality of what I'm saying"
It's what I do. It's what I'm paid to do. It's why I don't edit computing articles on WP. I spend my working day being told "you're overthinking the technicalities" by idiot managers who then have their projects fail, because they've under-thought their XML-based wunderkind. And if you pay me enough, I'll even smile at the bankruptcy party. I ain't doing it for free.
"being way too superficially dismissive"
Because I'm racing against an Undo button. If you want chapter and verse, read my conference papers and patents. I'm not one of the Dans, I don't work for Google, but I do have enough lumps in this field to know what I'm talking about.
" "Users" in the XML context are the people writing the content "
XML doesn't define a "user" layer, but let's go with them at that level. They need to create an incremental authoritative taxonomy, or even a folksonomy (which is a harder problem). "The journal staff create elements in an XML schema." Not in processable XML they don't. Because if they do, the software that needs to process this has never seen that element before and it has no idea how to process it. If you want to build a system that does, you need something smarter than XML. If you want to do it in a way that can communicate outside a single project (the babel schema problem) then you have to do it with RDF + RDF Schema (needs more than RDF alone, there's not much else in the same space, it doesn't yet need OWL, Schema is sufficient). Andy Dingley (talk) 16:29, 24 August 2017 (UTC)

What you said at one point is exactly what I am talking about ("In both cases, a schema needs to agreed beforehand, so that programmers can begin work on writing processors for this infoset. If either is extensible, they're both extended in the same way: by using a pre-defined and pre-agreed) metadata schema"). You think I'm flying past that and trying to tack on other things. I'm not. The point is that extensbile markup languages were invented to make that possible. That's what I'm talking about. This article in its current form does not explain or illustrate that for a layperson. A layperson has no conception of that until it is explained somehow or somewhere. Maybe I should be looking to lead them to it somewhere else like the articles Markup language or Standard Generalized Markup Language. It seems that you have been an IT expert for so many years that you have lost sight of how one would explain to laypersons why SGML or XML exist—why element extensibility exists. Quercus solaris (talk) 16:23, 24 August 2017 (UTC)

"were invented to make that possible" - yes they were. Sadly inventing them wasn't enough to make that possible. Optimism doesn't make the code work. XML doesn't work for doing this. Andy Dingley (talk) 16:32, 24 August 2017 (UTC)
@Andy Dingley: To show you that I am not making this up out of thin air, please take a look at W3schools XML 101 info at Look at the "food" and "calories" elements. This is what I am talking about. That website explains and shows to beginners what the purpose is. This Wikipedia article does not. Quercus solaris (talk) 16:30, 24 August 2017 (UTC)
  • W3Schools?
WTF is the point in even having this discussion? Andy Dingley (talk) 16:32, 24 August 2017 (UTC)
The point is this: Wikipedia explains to people what things are and why they exist in addition to how they work. I aim to improve this instance of the lack of that somehow, if not at this article then linking to discussions elsewhere at Wikipedia. That much is not a subjective matter of your being a greater IT expert (no doubt you are) who is overruling my edits. It objectively needs to be done somehow, and I will figure out how to do it. This aspect does not require my getting anyone's permission to edit or add. Quercus solaris (talk) 16:37, 24 August 2017 (UTC)
I think you should also take a deep breath long enough to see things through others' eyes. You seem incredibly touchy at the level of technical expertise and rapid dismissal, but it leaves your approach devoid of any pedagogical value. Rather than simply looking at the example that I provided, only to see what I am talking about, you apparently dismissed it without a glance. I am not saying that I am teaching you anything. I am just talking about explaining a subject to lay readers. Consider what Wikipedia is and why it exists: there is pedagogy involved. Don't be quite so quick to assume that I'm an idiot. I guess what I'm getting at is that in your rapid thrust to display your expertise and dismiss idiots, you're actually showing yourself to others to be too wounded and angry to see or admit the valid kernel that someone else is trying to get at, and work collaboratively from there. The net result is that you're not proving yourself immeasurably superior like you think you are. For any others reading this page, see as a clear example of what I am talking about, proving that Andy has mostly missed my point in his race to prove me an idiot. Quercus solaris (talk) 16:45, 24 August 2017 (UTC)
Whether justifiably or not, w3schools has a very poor reputation among many in the XML community and by citing it you reduce rather than enhance your credibility. Sorry: just saying so that you know. Equally, the fact that you take Andy's criticism so personally does not help your case. More substantively, there has been nearly 20 years of discussion on the xml-dev mailing list about the extent to which one can attribute semantics to XML, and the general consensus has always been that XML is no more than a syntactic framework for transmitting messages, which can only convey information from sender to recipient if the recipient has some knowledge of the meaning of the tags acquired through a separate channel. If you receive a message saying <book price="2.65">The Grapes of Wrath</book> you can make guesses about its possible meaning but you cannot make any reliable inferences. For example, if you inferred that the sender of the message was willing to pay you EUR 2.65 for your copy of the book, I suspect you would be wrong, but there is nothing in the message to say you are wrong. Mhkay (talk) 19:46, 19 September 2017 (UTC)
"Mhkay"? We're not worthy! Andy Dingley (talk) 20:37, 19 September 2017 (UTC)

Another unsourced claim[edit]

'Some other specifications conceived as part of the "XML Core" have failed to find wide adoption, including XInclude, XLink, and XPointer.' This sentence may be true (although DocBook documents use XInclude all the time, e.g. chapters in a book). But if it is true (and I want to emphasize that I don't know whether it's true), it needs to be sourced.

The trouble with this mechanistic approach to "sources" and "citations" is that a good encyclopedia article actually summarises knowledge aggregated from a very wide range of reading; a claim like this is not based on one well-researched statistical study that scientifically analyses the level of adoption of different technologies, it is based on the experience of someone who reads a lot, goes to a lot of conferences, and is well connected in the industry. I know there are those in the Wikipedia community who would deprive us of the benefit of such acquired wisdom, but personally, I find it invaluable. Mhkay (talk) 19:59, 6 November 2018 (UTC)

Likewise the statement "exchanging highly structured data between applications, which was not its primary design goal". I'm not sure where this claim comes from. It's true that "highly structured" isn't mentioned in the W3C's "origin and goals" section in the spec (, but the spec does talk about "data objects"--but one could just as well argue from this omission that XML wasn't intended for "non-highly structured" data, either. Anyway, if exchanging highly structured data between apps was not on the founding fathers' mind, a citation would be in order.

This slide just about sums it up: ("Q: Why the W3C XML Activity? A: Structured Document Interchange"). But I think citing one slide to justify this claim would be facile. You have to read the entire conference proceedings of conferences held around 1997/98 to get a view of the general climate of opinion at the time, not just one slide from one speaker. Mhkay (talk) 19:48, 6 November 2018 (UTC)

(BTW, I'd put this under the previous heading about unsourced claims, but I'm afraid it would just get lost there.) Mcswell (talk) 17:20, 5 November 2018 (UTC)

Where should xml-model go?[edit]

Is ISO/IEC 19757 xml-model worthy of its own article? Or should it be folded in here. If so where? Ross Lamont (talk) 07:48, 15 September 2017 (UTC)

A small doubt related[edit]

In this article, in the infobox named file format, for the attribute mime, please verify whether the first </code> needs to be removed or not.Adithyak1997 (talk) 08:00, 25 January 2019 (UTC)

Thanks for noticing, fixed --hulmem (talk) 18:44, 25 January 2019 (UTC)

Please use the correct inseption year in the description box[edit]

See --Krauss (talk) 11:23, 21 September 2019 (UTC)