Talk:Document type definition

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated C-class)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
WikiProject Internet (Rated C-class)
WikiProject icon This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 

Extremely unreadable article[edit]


Extremely unreadable article, this sentence in the opening paragraph is too long and full of jargon w terms which are not wikified: "A DTD is primarily used for the expression of a schema via a set of declarations that conform to a particular markup syntax and that describe a class, or type, of SGML or XML documents, in terms of constraints on the structure of those documents." Little Professor (talk) 18:41, 17 December 2007 (UTC)


Why the capital letters, instead of document type definition, in lower-case?



It would be nice if the example came with an explanation. At the moment, it's not terribly informative to the lay reader. For example, what does PCDATA mean? What is it in the example that makes the name MANDATORY but the other data items optional? -- Tarquin 15:50, 23 Dec 2003 (UTC)

DTD & Namespaces[edit]

I don't understand how a DTD and an XML namespace are different. Especially in the context of declaring your DTD

  • and
XML namespace when writing an XHTML page.

A DOCTYPE is NOT a DTD[edit]

I'm surprised there was a mistake like this on the Wikipedia, which one would assume to be full of web geeks. I got rid of that little mistake on this article and there's now an article for the Document Type Declaration. It's only a stub but at least we're not parading around a lie anymore.--holizz 22:40, 21 Nov 2004 (UTC)

In XML isn't a DTD the predecessor or an XML Schema?[edit]

I thought that, in XML, first there was DTD, and then along came XML Schema which was a more powerful way to specify what a valid XML document is. What is the relationship between an XML DTD and an XML Schema? I believe that I read that you should study XML DTDs first, but that they were superseded by the more powerful XML Schemas.

I must not understand because when I read the main article it seems to say that an XML DTD is an XML Schema.

Can someone clarify this for me?

Kaydell (talk) 13:35, 23 July 2008 (UTC)

It did not say it was an XML Schema, i.e. a W3C XML Schema Definition Language, it said it was an XML schema (lower case s), i.e. a schema language for or in or of XML, and linked to that page. I am reverting that contrast back. Rick Jelliffe (talk) 11:16, 29 August 2009 (UTC)

Sometimes people use the term "schema" to generically refer to all syntaxes for specifying the structure of XML documents, so this includes DTDs, XSD schemas, RELAX NG schemas, etc. Sometimes people use "schema" to refer to any XML-based way to specify XML document structure, so this refers to everything but DTDs. Unfortunately, most people use the term "schema" to refer to W3C XSD Schemas, because that's the most popular XML-based way to specify document structure. XSD schemas are more powerful in many ways than DTDs, but they are so much more complex that the added power often isn't worth it, implementations aren't consistent, and the W3C spec for it has parts that are just impossible to understand.

Bobdc (talk) 16:22, 12 August 2008 (UTC)

Stunned by Unstructured Calendar Date[edit]

An important feature of XML is that structured data can have a structured representation. The examples feature an xx/xx/xxxx style date. It has … just text (PCDATA), neither tags nor attributes. The slashes do not indicate a positional convention. IOW, the calendar date is using an ad hoc idiosyncratic syntax to indicate structure, in an XML document. And ambigous syntax at that! Even with xxxx being obviously the year, the two other number positions remain ambigous.

But this is a perfect use case for a DTD, since structural representation is precisely what a DTD can rectify, in this case using grammatical structuring of calendar dates. Since a day is a single item, I'd suggest an element type declaration for day, together with three attributes for the components of its calendar date. GeorgBauhaus (talk) 14:09, 11 August 2010 (UTC)

Good practice XML with DTDs always avoided structured date times.
The example has a calendar date, though, not a date time. (Is there a reference to Good Practice? My memories of good practice start in the days of SGML and seem to give different advice.)
The problem is two-fold: DTDs are a really poor way of defining data models (they don't, they define a document format) and so defining date fields in this way is needlessly verbose, yet adds no useful processability.
Objections:
  1. Processability requires a unique interpretation. The reader can process neither NOTATION (injecting foreign grammar in DTD grammar) nor structured elements. Therefore, there is no hope to process xx/xx/xxxx dates reliably, in to-be-honest practice. (The practice, using various ad hoc calendar date notations in heterogenous, multi-author, multi-program XML documents, is real, costly, a risk, and very time consuming; the consequences are part of what probalby makes me sound so upset.)
  2. Again, the example is not using a formalized ISO date time, it is using some ad hoc notation. It does this without NOTATION. There is no indication of a date writing convention. Hence, no format either. (This may be good enough for internal processing. Should the article be about XML for internal data processing?)
  3. Verbosity: Valid use of a structured calendar date needs the same amount of text:
<birthdate y="1977" m="2" d="4"/>
is no longer than
<birthdate>04/02/1977</birthdate>
Abiguity is removed, same verbosity by character count.
If this is a MOM document example, there might be justification because xx/xx/xxxx represent an “internally known date format”. But if this isn't clearly the case, then to me, it means that remaining arguments in favor of non-ISO ad hoc date formats involve subjective aesthetical preference, producers' lazyness---or maybe group pressured preference or mimicking widespread (mis)use? So for a Wikipedia article, I still think that some kind of explicit addition or change is needed to showcase the features of structured text.
Thanks for taking the time GeorgBauhaus (talk) 11:50, 12 August 2010 (UTC)
XML isn't SGML. In particular, the people using XML 10 years ago weren't SGML people. Previous SGML and DTD practices just didn't happen in XML: the early adopters didn't have an SGML background to know of them, by the time the later adopters came along, DTDs had been supplanted by XML Schema.
More importantly, XML differs from SGML in that it doesn't (by specific design) depend on a DTD for parsing. Without a need for one, and with a general incomprehension of DTD syntax, XML developers didn't use one - so the 8601 string format (which was possibly already familiar) was favoured. As to verbosity, then most half-decent XML processing doesn't work with the document, it works with the parsed DOM (as an API from the parser) and (rightly or not) most coders will see a string and possibly a regex results list as "less verbose" than a set of attributes.
Either way, early years XML did use multi-attribute dateTime representations, but this was back in the day when people thought 3-character all-uppercase element names were the way to design things. Both approaches soon disappeared, cemented by XML Schema's strong backing for 8601-like approaches. Andy Dingley (talk) 23:03, 12 August 2010 (UTC)
Secondly there were strong pre-existing formats for representing date times as text strings: ISO 8601 (W3C-backed web stuff) and RFC 822 (SMTP email headers). Unfortunately there were two of them, both with substantial legacies. 8601 is somewhat favoured, but both are still encountered (and to be honest, usually only partially implemented).
When XML Schema began to supplant DTDs, it did recommend formats for dateTime (which could then be referred to as a built-in primitive XML datatype), although the implementation is still basically a text string with 8601 formatting.
http://www.w3.org/TR/xmlschema-2/#dateTime
Andy Dingley (talk) 14:33, 11 August 2010 (UTC)

Comments[edit]

There's no mention of DTD --comments-- in this article though many of the examples use them. Is http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.3.1 part of the formal DTD definition or is that something that's only in the HTML standard? --Marc Kupper|talk 01:50, 21 December 2010 (UTC)

Suggestions[edit]

It may be merged into document type declaration. Sky6t (talk) 04:56, 7 February 2012 (UTC)

  • oppose Whilst not necessarily being against the idea of merging, there is a big problem with these two terms in that they're almost never understood or appreciated to be different things. However we choose to arrange these, it's vital that this distinction is made obvious, and then explained, in the perception of the readers. I see this as being easier to achieve with two separate articles. Andy Dingley (talk) 10:27, 7 February 2012 (UTC)

New section on security[edit]

I created a new section on security. It used to be possible to open XML files in MS Office, but Office 2010 refuses to open XML files with DTDs. I'm not an expert on this, so hopefully someone will expand this section, with an explanation of the error message "DTD prohibited". I suspect this is the main reason why Office users will find their way to this article. Margin1522 (talk) 09:23, 21 October 2013 (UTC)

Entity declarations accuracy[edit]

I don't think the examples given in the Entity declarations section are accurate at all. They seem to me to be obviously incorrect and invalid XML. — RockMFR 04:16, 4 February 2014 (UTC)

Specifics? Also we're not interested in either "validity" or "XML" here, it's a question of well-formedness and SGML, both of which are subtly different. Andy Dingley (talk) 12:04, 4 February 2014 (UTC)