XML database

From Wikipedia, the free encyclopedia
Jump to: navigation, search

An XML database is a data persistence software system that allows data to be stored in XML format. These data can then be queried, exported and serialized into the desired format. XML databases are usually associated with document-oriented databases.

Two major classes of XML database exist:[1]

  1. XML-enabled: these may either map XML to traditional database structures (such as a relational database[2]), accepting XML as input and rendering XML as output, or more recently support native XML types within the traditional database. This term implies that the database processes the XML itself (as opposed to relying on middleware).
  2. Native XML (NXD): the internal model of such databases depends on XML and uses XML documents as the fundamental unit of storage, which are, however, not necessarily stored in the form of text files.

Rationale for XML in databases[edit]

O'Connell gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa".[3] It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format. In content-based applications, the ability of the native XML database also minimizes the need for extraction or entry of metadata to support searching and navigation. In a native XML environment, the entire content store becomes metadata through query languages such as XPath and XQuery, including content, attributes and relationships within the XML (find string "XABr" within element <para> containing attribute 123 having value "P" or "Q", only within parent Y and siblings F or G.) While this level of search capability is possible in external metadata, it requires more complex and difficult processing to reproduce the content tree in metadata.

XML Enabled databases[edit]

XML enabled databases typically offer one or more of the following approaches to storing XML within the traditional relational structure:

  1. XML is stored into a CLOB (Character large object)
  2. XML is `shredded` into a series of Tables based on a Schema [4]
  3. XML is stored into a native XML Type as defined by the ISO[5]

RDBMS that support the ISO XML Type are:

  1. IBM DB2 (pureXML[6])
  2. Microsoft SQL Server [7]
  3. Oracle Database [8]
  4. PostgreSQL [9]

Typically an XML enabled database is best suited where the majority of data are non-XML. For datasets where the majority of data are XML a Native XML Database is better suited.

Example of XML Type Query in IBM DB2 SQL[edit]

SELECT
   id, vol, xmlquery('$j/name', passing journal AS "j") AS name
FROM
   journals
WHERE 
   xmlexists('$j[licence="CreativeCommons"]', passing journal AS "j")

Native XML databases[edit]

The term "native XML database" (NXD) can lead to confusion. Many NXDs do not function as standalone databases at all, and do not really store the native (text) form.

The formal definition from the XML:DB initiative (which appears to be inactive since 2003[10]) states that a native XML database:

  • Defines a (logical) model for an XML document — as opposed to the data in that document — and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order. Examples of such models include the XPath data model, the XML Infoset, and the models implied by the DOM and the events in SAX 1.0.
  • Has an XML document as its fundamental unit of (logical) storage, just as a relational database has a row in a table as its fundamental unit of (logical) storage.
  • Need not have any particular underlying physical storage model. For example, NXDs can use relational, hierarchical, or object-oriented database structures, or use a proprietary storage format (such as indexed, compressed files).

Additionally, many XML databases provide a logical model of grouping documents, called "collections". Databases can set up and manage many collections at one time. In some implementations, a hierarchy of collections can exist, much in the same way that an operating system's directory-structure works.

All XML databases now support at least one form of querying syntax. Minimally, just about all of them support XPath for performing queries against documents or collections of documents. XPath provides a simple pathing system that allows users to identify nodes that match a particular set of criteria.

In addition to XPath, many XML databases support XSLT as a method of transforming documents or query-results retrieved from the database. XSLT provides a declarative language written using an XML grammar. It aims to define a set of XPath filters that can transform documents (in part or in whole) into other formats including plain text, XML, or HTML.

Many XML databases also support XQuery to perform querying. XQuery includes XPath as a node-selection method, but extends XPath to provide transformational capabilities. Users sometimes refer to its syntax as "FLWOR" (pronounced 'Flower') because the query may include the following clauses: 'for', 'let', 'where', 'order by' and 'return'. Traditional RDBMS vendors (who traditionally had SQL-only engines), are now shipping with hybrid SQL and XQuery engines. Hybrid SQL/XQuery engines help to query XML data alongside the relational data, in the same query expression. This approach helps in combining relational and XML data.

Most XML Databases support a common vendor neutral API called the XQuery API for Java (XQJ). The XQJ API was developed at the JCP as a standard interface to an XML/XQuery data source, enabling a Java developer to submit queries conforming to the World Wide Web Consortium (W3C) XQuery 1.0 specification and to process the results of such queries. Ultimately the XQJ API is to XML Databases and XQuery as the JDBC API is to Relational Databases and SQL.

Language features[edit]

Name License Native Language XQuery 3.0 XQuery Update XQuery Full Text EXPath Extensions EXQuery Extensions XSLT 2.0
BaseX BSD License Java Yes Yes Yes Yes Yes Yes
eXist LGPL License Java Partial Proprietary[11] Proprietary Yes Yes Yes
MarkLogic Server Commercial C++ Partial Proprietary Proprietary No No Yes
Sedna Apache License C++ No Yes Yes No No No

Supported APIs[edit]

Name XQJ XML:DB RESTful RESTXQ WebDAV
BaseX Yes Yes Yes Yes Yes
eXist Yes Yes Yes Yes Yes
MarkLogic Server Yes No Yes Yes Yes
Sedna Yes Yes No No Yes

References[edit]

  1. ^ Bourret, Ronald (20 June 2010). "XML Database Products". Retrieved 16 December 2011. 
  2. ^ Mustafa Atay and Shiyong Lu, “Storing and Querying XML: An Efficient Approach Using Relational Databases”, ISBN 3-639-11581-3, VDM Verlag, 2009.
  3. ^ O'Connell, S. Advanced Databases Course Notes, Southampton, University of Southampton, 2005, 9.2
  4. ^ Creating XMLType Tables and Columns Based on XML Schema
  5. ^ ISO/IEC 9075-14:2011
  6. ^ IBM DB2 pureXML overview -- DB2 as an XML database
  7. ^ Using XML in SQL Server
  8. ^ XMLType Operations
  9. ^ PostgreSQL - Data Types - XML Type
  10. ^ "Frequently Asked Questions About XML:DB". The XML:DB Initiative. Sourceforge. 2003. Retrieved 16 December 2011. 
  11. ^ XQuery Update Extension

External links[edit]