Jump to content

XML Schema (W3C): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎External links: Added Oxygen Schema Editor.
Line 154: Line 154:
'''XML Schema Editors'''
'''XML Schema Editors'''
*[http://www.altova.com/products_xmlspy.html XmlSpy]
*[http://www.altova.com/products_xmlspy.html XmlSpy]
*[http://www.oxygenxml.com/xml_schema_editor.html Oxygen XML Editor]
*[http://www.liquid-technologies.com/Product_XmlStudio.aspx Liquid XML Studio] - A Free graphical XSD editor.
*[http://www.liquid-technologies.com/Product_XmlStudio.aspx Liquid XML Studio] - A Free graphical XSD editor.
*[http://www.eclipse.org/xsd Eclipse XSD Model], an open source Java implementation of the XML Schema model.
*[http://www.eclipse.org/xsd Eclipse XSD Model], an open source Java implementation of the XML Schema model.

Revision as of 08:15, 13 November 2008

XML Schema (W3C)
Filename extension
.xsd
Internet media type
application/xml, text/xml
Developed byWorld Wide Web Consortium
Type of formatSchema language
Extended fromXML
Standard1.0, Part 1 Structures (Recommendation),

1.0, Part 2 Datatypes (Recommendation),
1.1, Part 1 Structures (Draft),

1.1, Part 2 Datatypes (Draft)

XML Schema, published as a W3C recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C. Like all XML schema languages, XML Schema can be used to express a schema: a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XML Schema was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data types. Such a post-validation infoset can be useful in the development of XML document processing software, but the schema language's dependence on specific data types has provoked criticism.

Because of confusion between XML Schema as a specific W3C specification, and the use of the same term to describe schema languages in general, some parts of the user community started to refer to this language as WXS, while others have referred to it as XSD (the filename extension for schema documents is typically ".xsd", and "xsd" is also often used as a conventional namespace prefix). In the draft of the next version, 1.1, the W3C has chosen to adopt XSD as the preferred name.

History

In its appendix of references, XML Schema acknowledges the influence of DTD and other early XML schema efforts such as DDML, SOX, XML-Data, and XDR. It appears to have picked pieces from each of these proposals but is also a compromise among them. Of those languages, XDR and SOX continued to be used and supported for a while after XML Schema was published. A number of Microsoft products supported XDR until the release of MSXML 6.0 (which dropped XDR in favor of XML Schema) in December 2006. Commerce One, Inc. supported its SOX schema language until declaring bankruptcy in late 2004.

XSDs were the first W3C-recommended XML schemas to provide a namespace and datatype aware alternative to using XML's native Document Type Definitions (DTDs).

Schemas and Schema Documents

Technically, a schema is an abstract collection of metadata, consisting of a set of schema components: chiefly element and attribute declarations and complex and simple type definitions. These components are usually created by processing a collection of schema documents, which contain the source language definitions of these components. In popular usage, however, a schema document is often referred to as a schema.

Schema documents are organized by namespace: all the named schema components belong to a target namespace, and the target namespace is a property of the schema document as a whole. A schema document may include other schema documents for the same namespace, and may import schema documents for a different namespace.

When an instance document is validated against a schema (a process known as assessment), the schema to be used for validation can either be supplied as a parameter to the validation engine, or it can be referenced directly from the instance document using two special attributes, xsi:schemaLocation and xsi:noNamespaceSchemaLocation. (The latter mechanism requires the client invoking validation to trust the document sufficiently to know that it is being validated against the correct schema.)

XML Schema Documents usually have the filename extension ".xsd". A unique Internet Media Type is not yet registered for XSDs, so "application/xml" or "text/xml" should be used, as per RFC 3023.

Data types

Unlike DTDs, an XML Schema allows the content of an element or attribute to be validated against a data type. For example, an attribute might be constrained to be a valid date, or a decimal number.

XSD provides a set of 19 primitive data types (boolean, string, decimal, double, float, anyURI, QName, hexBinary, base64Binary, duration, date, time, dateTime, gYear, gYearMonth, gMonth, gMonthDay, gDay, and NOTATION). It allows new data types to be constructed from these primitives by three mechanisms: restriction (reducing the set of permitted values), list (allowing a sequence of values), and union (allowing a choice of values from several types). 25 derived types are defined within the specification itself, and further derived types can be defined by users in their own schemas.

Post-Schema-Validation Infoset

After XML Schema-based validation, it is possible to express an XML document's structure and content in terms of the data model that was implicit during validation. The XML Schema data model includes:

  • the vocabulary (element and attribute names)
  • the content model (relationships and structure)
  • the data types.

This collection of information is called the Post-Schema-Validation Infoset (PSVI). The PSVI gives a valid XML document its "type" and facilitates treating the document as an object, using object-oriented programming (OOP) paradigms.

This particular OOP approach to XML data access was primarily advocated by Microsoft, a major contributor to the development of XML Schema. Converting an XML document to a datatype-aware object can be beneficial in some parts of computer software design, but critics contend that it also undermines openness, a key feature of XML, and that it is biased toward compatibility with the datatypes native to Microsoft's favored programming languages.[1]

In addition, the limitations inherent to (and caused by) XML Schema datatypes, the restrictive coupling of those datatypes with the rest of XML Schema, and dependencies on those datatypes in other W3C specifications are points of contention among a number of XML software developers.[2]


Example

An example of a very simple XML Schema Definition to describe a UK Address

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Address">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Recipient" type="xs:string" />
        <xs:element name="House" type="xs:string" />
        <xs:element name="Street" type="xs:string" />
        <xs:element name="Town" type="xs:string" />
        <xs:element minOccurs="0" name="County" type="xs:string" />
        <xs:element name="PostCode" type="xs:string" />
        <xs:element name="Country">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="FR" />
              <xs:enumeration value="DE" />
              <xs:enumeration value="ES" />
              <xs:enumeration value="UK" />
              <xs:enumeration value="US" />
            </xs:restriction>
          </xs:simpleType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

A number of development tools can be used to create a graphical representation of a schema. Many of them create diagrams similar to the one shown below:

A graphical representation of the schema code above

An example of an XML document that conforms to this schema

<?xml version="1.0" encoding="utf-8"?>
<Address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="SimpleAddress.xsd">
  <Recipient>Mr. Walter C. Brown</Recipient>
  <House>49</House>
  <Street>Featherstone Street</Street>
  <Town>LONDON</Town>
  <PostCode>EC1Y 8SY</PostCode>
  <Country>UK</Country>
</Address>


Secondary uses for XML Schemas

The primary reason for defining an XML schema is to formal describe an XML document, however the resulting schema has a number of other uses that go beyond simple validation.

Document generation

The schema can be used to generated human readable documentation, this is especially useful where the authors have made use of the annotation elements. No formal standard exists for documentation generation, but a number of tools are available that will produce high quality readable html and printed material.

Code Generation

The schema can also be used to generate code, this is referred to as XML Data Binding. This code allows XML documents to be treated as objects within the programming environment.

Criticism

Although XML Schema is undoubtedly successful in that it has been widely adopted and largely achieves what it set out to achieve, it has been the subject of a great deal of severe criticism, perhaps more so than any other W3C Recommendation.

A good summary of the criticisms is provided by James Clark[3] (who is admittedly promoting his own alternative, RELAX NG):

  • A schema written using XSD is difficult to read and understand
  • There are many surprises in the language, for example that restriction of elements works differently from restriction of attributes
  • The W3C Recommendation itself is extremely difficult to read
  • XSD lacks any formal mathematical specification
  • XSD provides no facilities to state that the value or presence of one attribute is dependent on the values or presence of other attributes (so-called co-occurrence constraints)
  • XSD offers very weak support for unordered content
  • The set of datatypes on offer is highly arbitrary
  • There is no way for an XSD schema to indicate which elements are permitted at the top level of a document
  • The use of xsi:schemaLocation, an attribute that appears within an instance to identify the schema to be used for validation, causes security and interoperability problems
  • It is not a good idea that a schema should perform two functions at once: validation, and augmentation of the instance with type information and default values

See also

References

External links

W3C XML Schema Specification

Books

  • Definitive XML Schema, Priscilla Walmsley, Prentice-Hall, 2001, ISBN 0130655678
  • XML Schema, Eric van der Vlist, O'Reilly, 2001, ISBN 0596002521
  • The XML Schema Companion, Neil Bradley, Addison-Wesley, 2003, ISBN 0321136179
  • Professional XML Schemas, Jon Ducket et al, Wrox Press, 2001, ISBN 1861005474
  • XML Schemas, Lucinda Dykes et all, Sybex, ISBN 0782140459

Tutorials

XML Schema Editors