Jump to content

XML database

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 83.149.199.196 (talk) at 14:26, 19 October 2016 (Native XML databases). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. These data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

Rationale for XML in databases

There are a number of reasons to directly specify data in XML or other document formats such as JSON. For XML in particular, they include:[1] [2]

  • An enterprise may have a lot of XML in an existing standard format
  • Data may need to be exposed or ingested as XML, so using another format such as relational forces double-modeling of the data
  • XML is very well suited to sparse data, deeply nested data and mixed content (such as text with embedded markup tags)
  • XML is human readable whereas relational tables require expertise to access
  • Metadata is often available as XML
  • Semantic web data is available as RDF/XML

Steve O'Connell gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa".[3][needs update] It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format. In content-based applications, the ability of the native XML database also minimizes the need for extraction or entry of metadata to support searching and navigation.

XML Enabled databases

XML enabled databases typically offer one or more of the following approaches to storing XML within the traditional relational structure:

  1. XML is stored into a CLOB (Character large object)
  2. XML is `shredded` into a series of Tables based on a Schema[4]
  3. XML is stored into a native XML Type as defined by ISO Standard 9075-14[5]

RDBMS that support the ISO XML Type are:

  1. IBM DB2 (pureXML[6])
  2. Microsoft SQL Server[7]
  3. Oracle Database[8]
  4. PostgreSQL[9] [10]

Typically an XML enabled database is best suited where the majority of data are non-XML. For datasets where the majority of data are XML, a native XML database is better suited.

Example of XML Type Query in IBM DB2 SQL

select
   id, vol, xmlquery('$j/name', passing journal as "j") as name
from
   journals
where 
   xmlexists('$j[licence="CreativeCommons"]', passing journal as "j")

Native XML databases

These databases are typically better when much of the data is in XML or other non-relational formats.[citation needed]

All the above databases uses XML as an interface to specify documents as tree structured data that may contain unstructured text, but on disk the data is stored as "optimized binary files." This makes query and retrieval faster. For MarkLogic it also allows XML and JSON to co-exist in one binary format.[11]

Key features of native XML databases include:

  • Has an XML document as at least one fundamental unit of (logical) storage, just as a relational database has a row in a table as a fundamental unit of (logical) storage.
  • Need not have any particular underlying physical storage model. For example, NXDs can use optimized, proprietary storage formats. This is a key aspect of XML databases. Managing XML as large strings is inefficient due to the extra markup in XML. Compressing and indexing XML allows the illusion of directly accessing, querying and transforming XML while gaining the performance advantages of working with optimized binary tree structures.[12]

The standards for XML querying per W3C recommendation are XQuery 1.0 and XQuery 3.0.[citation needed] XQuery includes XPath as a sub-language and XML itself is a valid sub-syntax of XQuery.

In addition to XPath, XML databases support XSLT as a method of transforming documents or query-results retrieved from the database. XSLT provides a declarative language written using an XML grammar. It aims to define a set of XPath filters that can transform documents (in part or in whole) into other formats including plain text, XML, or HTML.

But big picture, XML persistence describes only one format in the larger, faster moving NoSQL movement at this time. Many databases support XML plus other formats, even if XML is internally stored as an optimized, high-performance format and is a first-class citizen within the database. (see Google Trends Link above to see relative popularity of terms).

Language features

Name License Native Language XQuery 3.0 XQuery Update XQuery Full Text EXPath Extensions EXQuery Extensions XSLT 2.0
BaseX BSD License Java Yes Yes Yes Yes Yes Yes
eXist LGPL License Java Partial Proprietary Proprietary No Yes Yes
MarkLogic Server Commercial C++ Partial Proprietary Proprietary No No Yes
Qizx Commercial Java Yes Yes Yes No No Yes

Supported APIs

Name XQJ XML:DB RESTful RESTXQ WebDAV
BaseX Yes Yes Yes Yes Yes
eXist Yes Yes Yes Yes Yes
MarkLogic Server Yes No Yes Yes Yes
Qizx No No Yes No No
Sedna Yes Yes No No No

References

  1. ^ Nicola, Matthias (28 September 2010). "5 Reasons for Storing XML in a Database". Native XML Database. Retrieved 17 March 2015.
  2. ^ Feldman, Damon (11 April 2013). Moving from Relational Modeling to XML and MarkLogic Data Models. MarkLogic World. Retrieved 17 March 2015. {{cite conference}}: External link in |conferenceurl= (help); Unknown parameter |conferenceurl= ignored (|conference-url= suggested) (help)
  3. ^ O'Connell, Steve (2005). "Section 9.2". Advanced Databases Course Notes (Syllabus). Southampton, England: University of Southampton.
  4. ^ "XML Schema Storage and Query: Basic". Oracle XML DB Developer's Guide, 10g Release 2. Oracle Corporation. August 2005. Retrieved 17 March 2015.. Section Creating XMLType Tables and Columns Based on XML Schema
  5. ^ "ISO/IEC 9075-14:2011: Information technology -- Database languages -- SQL -- Part 14: XML-Related Specifications (SQL/XML)". International Organization for Standardization. 2011. Retrieved 17 March 2015.
  6. ^ "pureXML overview -- DB2 as an XML database". IBM Knowledge Center. IBM. Retrieved 17 March 2015.
  7. ^ "Using XML in SQL Server". Microsoft Developer Network. Microsoft Corporation. Retrieved 17 March 2015.
  8. ^ "XMLType Operations". Oracle XML DB Developer's Guide, 10g Release 2. Oracle Corporation. August 2005. Retrieved 17 March 2015.
  9. ^ "8.13. XML Type". PostgreSQL 9.0.19 Documentation. Retrieved 17 March 2015.
  10. ^ PostgreSQL - Data Types - XML Type
  11. ^ Siegel, Erik; Retter, Adam (December 2014). "4. Architecture". eXist. O'Reilly & Associates. ISBN 978-1-4493-3710-0. Retrieved 18 March 2015.
  12. ^ Kellogg, Dave (11 April 2010). "Yes, Virginia, MarkLogic is a NoSQL System". Kellblog. Retrieved 18 March 2015.