Talk:XML
| This is the talk page for discussing improvements to the XML article. | |||
|---|---|---|---|
|
|
||
| Archives: 1, 2, 3, 4 | |||
| WikiProject Internet | (Rated B-class, High-importance) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||
| WikiProject Computing | (Rated B-class, High-importance) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||
| A Wikipedia contributor, TimBray (talk · contribs), may be personally or professionally connected to the subject of the article. This user's editing has included contributions to this article. Relevant guidelines covering this situation include Wikipedia:Conflict of interest, Wikipedia:Autobiography and Wikipedia:Neutral point of view. |
| Find sources: "XML" – news · books · scholar · JSTOR · free images |
Archives |
|---|
[edit] Music Markup Language
Please consider adding this external link: Music Markup Language. -- Wavelength (talk) 18:28, 14 April 2010 (UTC)
- (1) There's already an article on it (Music Markup Language) and internal links are preferred to external links (2) It is already mentioned in the categories and list articles linked in the 'See also' section (3) Why would this particular language merit mentioning? There are tons of XML-based languages, it's not feasible to list every single one on the page; only a very few examples are given, solely in the last paragraph of the introduction, and those ones are wildly popular; there is no indication Music Markup Language is anywhere near as widely used. --Cybercobra (talk) 18:44, 14 April 2010 (UTC)
- Thank you for your reply. I did not notice the article on it. I accept your reasoning for not listing it as an external link.
- -- Wavelength (talk) 16:47, 15 April 2010 (UTC)
[edit] I would like to see more resource to the tutorial resource
Hello, members of this community. I consider that this article is very informative, professionaled and so on. The great tutorials was used for writing of this article such w3schools. But I would like to see here tutorials which are not so great but very useful for beginers too, such as http://phpforms.net/tutorial/tutorial.html What do you think you about it? Thank you in advance. —Preceding unsigned comment added by Malinari (talk • contribs) 16:35, 21 April 2010 (UTC)
- Wikipedia is not a tutorial: see What Wikipedia is not. That policy statement doesn't directly address the level at which an article should be written, for example whether an article on XML should be written for the general public or for professional programmers. But in my view, this article is pitched at about the right level. There are plenty of other sources if you want a more gentle introduction or (conversely) something addressing the formal computer science audience. Mhkay (talk) 13:26, 22 April 2010 (UTC)
The above commentary notwithstanding, this entry is incredibly obtuse, only slightly more readable than a reference book. I come here every few months lookng for some useful information as to what XML is, forgetting my last abortive attempts to understand it here, the details of what it is made up of, etc, something that I can elucidate my own lacking knowledge of it. Instead I read abstruse cryptic commentary, assumptive descriptions, and ambiguous terminology that presumes the reader knows enough about the topic that it would appear unnecessary to read the entry. I am a programmer from the old school, and an EE so I have had my soirees with technical manuals, but this prose is so dense and undefinitive as to the terms used, I find my mind wandering away from it, not mulling over the information in it. It doesn't have to be a tutorial to be understandable.
[edit] text/xml deprecated?
Not sure why it says text/xml is deprecated.
Just skimming over the RFC can't see that explicitly [1]
Jjjjjjjjjj (talk) 19:20, 10 June 2010 (UTC)
- I have updated the citation to the IETF memo that deprecates text/xml and explains why. Mhkay (talk) 22:57, 11 June 2010 (UTC)
The RFC says "If an XML document -- that is, the unprocessed, source XML document -- is readable by casual users, text/xml is preferable to application/xml." I think characterising this as deprecation is inaccurate. Perhaps the description should be "application/xml (preferred for most technical use), text/xml (preferred when readable by casual users)" with a link to the RFC. What do people think? Paul Foxworthy (talk) 03:52, 13 June 2010 (UTC)
- The cited Murata/Kohn/Lilley memo clearly labels text/xml as deprecated. This memo is much more recent than RFC 3023. The problems with text/xml largely emerged after 3023 was published. Mhkay (talk) 22:50, 14 June 2010 (UTC)
- Thanks. But the citation is to RFC 3023. If the MKL memo is the source that confirms that text/xml is deprecated, then the citation is misleading. Well, it misled me at least :-). The memo I can find [2] is a draft and supposedly expired in March. If it's now an RFC, where is it? I propose a second citation be added to the deprecation referring to the draft. If and when the draft becomes an RFC to replace 3023, there should be just one citation that refers to that replacement. Does that make sense to everyone?Paul Foxworthy (talk) 15:47, 21 June 2010 (UTC)
- I can't see where you have problems. (You say "But the citation is to RFC 3023". But there are multiple citations.) The article says that RFC 3023 standardizes text/xml and application/xml, which is true, and it also says that text/xml is in the process of being deprecated, which is also true, and both statements are linked to relevant citations. I've no idea what the current state of that process is, but the fact that the memo has timed out doesn't mean the process has been abandoned, unless you can find evidence to the contrary. Mhkay (talk) 21:39, 21 June 2010 (UTC)
- I was talking about the citation in the infobox, sorry I didn't make that clear. I am not too fussed about the status of the memo, all I want is the best citation for the fact that text/xml has been deprecated.Paul Foxworthy (talk) 06:02, 1 July 2010 (UTC)
- I've added a citation in the infobox. Paul Foxworthy (talk) 04:52, 6 July 2010 (UTC)
- Now it still looks as if it were deprecated, but in reality it isn't. It was - as you tell yourself - deprecated in a draft which expired. It this really notable? Anyway it should be made clear, that it isn't deprecated and may never be deprecated, although there may be reasons against it's use. —h.e.r—79.236.22.145 (talk) 08:54, 2 August 2010 (UTC)
[edit] XML Abuse
XML being developed for text markup is being used as general serialization container for any data structure.
Should we add a section about XML Abuse?
Or maybe just a reference at the header should be added?
What do you think?
I would like to add a reference to the header since this is an important problem. —Preceding unsigned comment added by 87.217.111.16 (talk) 07:03, 20 June 2010 (UTC)
- I think you would find it very hard to get consensus on any statement (let alone one short and pithy enough to go in the article lead) about when XML is and is not appropriate. Certainly, the opinions on the page you cite are far too debateable to go here. Let's keep this article factual and concise. It should tell people what XML is and does, not try to precis all the debates about its whys and wherefores. This is an encyclopedia. Mhkay (talk) 21:44, 21 June 2010 (UTC)
-
- I would like to see mention to the XML abuse as this is an extended practice: What is the worst abuse of XML that you have seen? —Preceding unsigned comment added by 79.144.221.71 (talk) 18:41, 12 December 2010 (UTC)
- XML abuse is a serious, real-world problem and as such it should be addressed by the Wikipedia article. Things have cooled off now that the current buzz is about anything with the word "cloud" written over it, but it was quite terrible not many years ago. XML probably was, and still is, the most widely misunderstood and heavily buzzworded technology of this century so far, and acts as a selling point of anything that uses it, regardless of purpose and schema. People (especially pointy-haired bosses) think anything which has "XML" in the box will automagically talk to anything else with the same label. 08:55, 13 January 2011 (UTC)
A very interesting insight on two potential examples of XML abuse can be quoted from Håkon Wium Lie, Opera's CTO (e.g. here): he describes OOXML and ODF as essentially "memory dumps with angle brackets". 08:55, 13 January 2011 (UTC)
08:55, 13 January 2011 (UTC) —Preceding unsigned comment added by 217.125.117.197 (talk)
Thank you very much for keeping the Criticism section and making clear that XML should not be used to represent structured data, but narrative documents. — Preceding unsigned comment added by 193.127.207.152 (talk) 08:09, 13 October 2011 (UTC)
[edit] XML (Extensible Markup Language) is a set of rules for encoding documents in machine-readable form.
Would it not be more appropriate to say that xml is for encoding in human-readable form?
What is machine-readable form supposed to mean?
UndercoverAgents (talk) 18:52, 7 July 2010 (UTC)
- Its sole purpose is to interpret human-readable content/context and to turn it into machine-readable from which through a medium/interface. XML can be read/interpreted and parsed from within compiled and ascii, which suggests both application and text would be valid (personal opinion). Daemondevel (talk) 00:57, 4 August 2010 (UTC)
[edit] Spelled-out title
The W3C defines XML as follows:
Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents ... and so on
The important part here is that the article should first use the fully spelled out name and then the abbreviation. This is not only in accordance with the standard, but also follows general rules of good writing in English. Kbrose (talk) 03:15, 4 August 2010 (UTC)
- This might be true if there were consensus that "XML" is an abbreviation of the three-word form, but many of us just don't believe that. Tim Bray (talk) 05:39, 4 August 2010 (UTC)
[edit] "Extensible Markup Language" first?
We're going to need to sort out what it should say at the top of the article:
Current candidates:
Extensible Markup Language (XML) and XML (Extensible Markup Language)
The first is supported by the English convention that a full name is listed first, and wording from the W3C spec: "The Extensible Markup Language (XML) is..." The second by the fact that the title of the article is (appropriately) XML and since the three-letter version is used rather than the three-word version in approximately 100% of spoken and written discourse.
Also note that XML is *not* an abbreviation or an acronym, it is just another name for the same thing.
My vote would be that the primary name should be the same as the title of the article and should reflect common usage. But it's not a matter of life or death. What do others think? Tim Bray (talk) 03:19, 4 August 2010 (UTC)
- Of course it's an abbreviation, even the standards documents specifically say so, as quoted above. The title of the article should also be the full name, ideally. The reason that it isn't, is that too many writers here are suffering from Acronymitis. Almost all articles of computer networking protocols use the full protocol name as title, even for the most common of protocols, such as IP. Kbrose (talk) 03:27, 4 August 2010 (UTC)
- There's no need for the title and the first name mentioned in the lede to match, particularly for acronyms where the full name is less common: NATO, Laser; see WP:SINGULAR on acronyms. --Cybercobra (talk) 03:56, 4 August 2010 (UTC)
-
- Empirically, the first letter of "Extensible" is not "X". It is generally agreed upon that writing "eXtensible Markup Language" is an error, which is another symptom of the fact that "XML" and "Extensible Markup Language" are two names for the same thing, one immensely more popular and widely used than the other. When I drafted the first sentence of the XML specification, I was insufficiently percipient to have predicted which would catch on. Tim Bray (talk) 05:38, 4 August 2010 (UTC)
- XML is definitely an acronym, as evidenced by the fact that "ML" is "Markup Language" and "X" is generally considered an acceptable character abbreviation to represent extensible, at least in part because extensible and xtensible share the exact same phonetics. For other examples see XP (eXtreme Programming, eXperience Point), XSL (eXtensible Stylesheet Language), XBML (eXtended Business Modeling Language, eXtensible Battle Management Language), XMP (eXtensible Metadata Platform), and so on. The oXygen editor's product name is a play on the "X" acronym use, so with many counter examples, I would argue that it's not generally agreed that eXtensible is incorrect. It may not be a well-formed acronym, but it does have the most important semantic mapping characteristic of an acronym and in the one case that it doesn't take the first letter mapping, it uses an acceptable replacement. That's the first point, so if I replace XML with DNA, does the second point hold up?
- "The second by the fact that the title of the article is (appropriately) DNA and since the three-letter version is used rather than the three-word version in approximately 100% of spoken and written discourse."
- In this case, it's obvious that the typical rules of English apply, even though most people probably don't even know what DNA stands for any more. I can definitely see (and agree with) the logic of mapping from the commonly seen and heard acronym back to the expanded form when the acronym serves as a mental key, but that's inconsistent with currently correct english usage. It's essentially guaranteed that acronyms are always going to be more popular and more widely used than their expansions because that is their very purpose, so your argument would apply to all acronyms. MaxxD (talk) 07:10, 4 August 2010 (UTC)
-
- (Okay, I'll argue with myself...) XML is technically an initialism, not an acronym or abbreviation because it is not a pronounceable word, but the point that it does represent the initials of the expanded form (notwithstanding the ex/x issue) is reasonable. However, The Chicago Manual of Style (CMS) states, "Occasionally, too, it makes sense to use the acronym first and put the full name in parentheses, if the acronym in question is so familiar to your expected audience that it almost goes without explication." [3] and XML has certainly achieved this distinction, so writing it as "XML (Extensible Markup Language)" is not only perfectly okay per the CMS, but almost certainly preferred. MaxxD (talk) 09:36, 4 August 2010 (UTC)
-
- Empirically, the first letter of "Extensible" is not "X". It is generally agreed upon that writing "eXtensible Markup Language" is an error, which is another symptom of the fact that "XML" and "Extensible Markup Language" are two names for the same thing, one immensely more popular and widely used than the other. When I drafted the first sentence of the XML specification, I was insufficiently percipient to have predicted which would catch on. Tim Bray (talk) 05:38, 4 August 2010 (UTC)
[edit] Details of valid characters
The section "details of valid characters" is getting absurdly detailed, especially as it appears so close to the start of the article. It's simply not interesting to the average reader who comes here wanting an overview of what XML is - the kind of people who want this level of detail are much more likely to go to the specs than to come here. I think the usual Wikipedia solution is to move the material out to a separate article, and I propose doing that. Mhkay (talk) 11:06, 13 August 2010 (UTC) (Now done.)
[edit] By definition?
Under "key terminology" it is stated: "By definition, an XML document is a string of characters.". By what definition, pray? That's not what the definition of "document" in the XML 1.0 rec says. It might be nice if it did, but it doesn't. Instead it mumbles about "textual objects", thus leaving (deliberately?) ambiguous the question of whether a document is a sequence of characters or a sequence of octets. Mhkay (talk) 23:23, 25 August 2010 (UTC)
[edit] This description is only useful to people who already know what XML is useful for
Hi!
it would be helpful if someone re-wrote this to explain why XML exists, as this would justify the entry. —Preceding unsigned comment added by 86.9.13.234 (talk) 14:49, 8 September 2010 (UTC)
[edit] "&" and "<" in XML entity values
- The article itself states they "may never appear in content."
- The matching reference's summary states they are allowed (just not recommended).
- The actual reference (i.e. the specs) states they "MUST NOT appear in their literal form, except when..." (certain cases like when inside CDATA).
So should the first two be fixed to reflect the latter? Can someone offer correct fixes then? -109.66.203.215 (talk) 08:52, 16 November 2010 (UTC)
[edit] Example shown in Icon, not technically invalid but a poor example
Looking at the example, it shows questions and answers being thrown straight into the <quiz> bracket. Surely each pair of Q&A would need to be wrapped in a tag <round> or <question_set>? Otherwise the program using this would have to read through the whole thing serially for any of it to make sense.
This is more a practical issue and not a technical one.--92.14.116.17 (talk) 14:12, 3 March 2011 (UTC)
-
- Using XML for question-and-answer quizzes seems to be a common student exercise set by unimaginative teachers, and as the problem never occurs in real life I guess you'd better find out what those teachers consider the right answer to be. Or at any rate, find out what requirements they are assessing the solution against. Mhkay (talk) 15:15, 3 March 2011 (UTC)
Still a poor example, and barely an example as it is shown as a small piece of graphics. There should definitely be a real example in the text. And in that example it should be explained which one is the root element. My issue is that I believe I heard that the root element is in fact an implicit element above the topmost element. This article does not even explain what a root element is, just that there is only one. 193.140.194.148 (talk) 12:15, 13 January 2012 (UTC)
[edit] History needs a bit of cleanup
Fixed a couple of things, but the section could (and should) be much better written. A very short to-do list, in decreasing order of importance:
- the link to Kimber's blog is totally out of place;
- more supporting citations are needed;
- more historical sources should be found and linked to.
Andy Monakov (talk) 11:32, 15 September 2011 (UTC)
- On the number-of-weeks issue, I can't get Jon's count to work in my head. I seem to remember that we were working in at least part of August, and when I pop up a 1996 calendar I have trouble getting the week count down to his number. However, it is absolutely the case that the first wave of work was in the August-November timeframe, so I thought it best just to say that rather than arguing over the number of weeks. On the section in general, I agree it's rambling and messy, that may have been partially a consequence of too many of the people who were involved wanting their opinion/contribution included. I think it would be a good idea for someone else to be bold and clean it up. Tim Bray (talk) 17:35, 20 September 2011 (UTC)
[edit] Jsonix
Is it worth including a link to Jsonix? --Gak (talk) 12:22, 21 September 2011 (UTC)
- A bit of Web searching reveals no uptake, and also confluence.highsource.org is offline. So, no. Tim Bray (talk) 23:04, 30 September 2011 (UTC)
[edit] Large commented out section under Well-formedness and error-handling
The section in question can be found at the end of this section: http://en.wikipedia.org/w/index.php?title=XML&action=edit§ion=8
Are there plans to use that? If not it should be removed, although it does seem to contain some valid information. — Preceding unsigned comment added by Nick Garvey (talk • contribs) 04:03, 2 November 2011 (UTC)
- Hidden commented out sections like this are a menace. I've moved it here from the article:
| Extended content |
|---|
|
Tree representation of an XML Document The nesting of elements leads directly to a tree representation for an XML document. The root element becomes the root of a tree. Because every element is composed of a sequence of other elements and character data, it is easy to determine the children of each element. Just take each item in the sequence and create a new child node. Here is an example of a structured XML document: <recipe name="bread" prep_time="5 mins" cook_time="3 hours"> <title>Basic bread</title> <ingredient amount="8" unit="dL">Flour</ingredient> <ingredient amount="10" unit="grams">Yeast</ingredient> <ingredient amount="4" unit="dL" state="warm">Water</ingredient> <ingredient amount="1" unit="teaspoon">Salt</ingredient> <instructions> <step>Mix all ingredients together.</step> <step>Knead thoroughly.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Knead again.</step> <step>Place in a bread baking tin.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Bake in the oven at 180(degrees)C for 30 minutes.</step> </instructions> </recipe> !-- Not well-formed fragment -- <title>Book on Logic<author>Aristotle</title></author> One way of writing the same information in a way which could be incorporated into a well-formed XML document is as follows: !-- Well-formed XML fragment -- <title>Book on Logic</title> <author>Aristotle</author> In XML, the proper way of nesting code is through parallel data and character data Ex. <paragraph> Hello, my name is<first-name>John</first-name> <last-name> Doe</last-name>from the <country>United States</country> </paragraph> This shows the “paragraph” consists of a sequence of five items. The “first-name”, “last-name”, and “country” elements consisted of character data and the other two areas were just character data. [edit] Entity referencesAn entity in XML is a named body of data, usually text. Entities are often used to represent single characters that cannot easily be entered on the keyboard; they are also used to represent pieces of standard ("boilerplate") text that occur in many documents, especially if there is a need to allow such text to be changed in one place only. Special characters can be represented either using entity references, or by means of numeric character references. An example of a numeric character reference is " An entity reference is a placeholder that represents that entity. It consists of the entity's name preceded by an ampersand ("
Here is an example using a predeclared XML entity to represent the ampersand in the name "AT&T": <company_name>AT&T</company_name> Additional entities (beyond the predefined ones) can be declared in the document's Document Type Definition (DTD). A basic example of doing so in a minimal internal DTD follows. Declared entities can describe single characters or pieces of text, and can reference each other.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE example [
<!ENTITY copy "©">
<!ENTITY copyright-notice "Copyright © 2009, XYZ Enterprises">
]>
<example>
©right-notice;
</example>
When viewed in a suitable browser, the XML document above appears as:
[edit] Numeric character referencesNumeric character references look like entity references, but instead of a name, they contain the " <company_name>AT&T</company_name> <company_name>AT&T</company_name> Similarly, in the previous example, notice that "©" is used to generate the “©” symbol. See also numeric character references. [edit] Well-formed documentsIn XML, a well-formed document must conform to the following rules, among others:
Element names are case-sensitive. For example, the following is a well-formed matching pair:
whereas these are not
By carefully choosing the names of the XML elements one may convey the meaning of the data in the markup. This increases human readability while retaining the rigor needed for software parsing. Choosing meaningful names implies the semantics of elements and attributes to a human reader without reference to external documentation. However, this can lead to verbosity, which complicates authoring and increases file size. [edit] Automatic verificationIt is relatively simple to verify that a document is well-formed or validated XML, because the rules of well-formedness and validation of XML are designed for portability of tools. The idea is that any tool designed to work with XML files will be able to work with XML files written in any XML language (or XML application). Here are some examples of ways to verify XML documents:
irb> require "rexml/document" irb> include REXML irb> doc = Document.new(File.new("test.xml")).root |
- --Cybercobra (talk) 05:12, 2 November 2011 (UTC)
[edit] Character entity references for escaping
I attempted to link the #Escaping section with the Character entity reference article. I thought the link was relevant because it seems that article also lists the same five objects, and could potentially expand on the topic. If the problem with my change was just an issue with terminology or semantics, perhaps I can avoid this by directly naming of the article, for example “There are five predefined entities (see Character entity reference)”. Otherwise, I’d love to know why the two topics shouldn’t be related when they seem so similar. Vadmium (talk, contribs) 12:34, 5 February 2012 (UTC).
Cite error: There are <ref> tags on this page, but the references will not show without a {{Reflist}} template or a <references /> tag; see the help page.