Document-oriented database

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of so-called NoSQL databases and the popularity of the term "document-oriented database" (or "document store") has grown[1] with the use of the term NoSQL itself. In contrast to relational databases and their notions of "Relations" (or "Tables"), these systems are designed around an abstract notion of a "Document".

Documents[edit]

The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).

Documents inside a document-oriented database are similar, in some ways, to records or rows in relational databases, but they are less rigid. They are not required to adhere to a standard schema, nor will they have all the same sections, slots, parts, or keys. For example, the following is a document:

  {
     FirstName: "Bob",
     Address: "5 Oak St.",
     Hobby: "sailing"
  }

A second document might be:

  {
    FirstName: "Jonathan",
    Address: "15 Wanamassa Point Road",
    Children: [
         {Name: "Michael", Age: 10},
         {Name: "Jennifer", Age: 8},
         {Name: "Samantha", Age: 5},
         {Name: "Elena", Age: 2}
    ]
   }

These two documents share some structural elements with one another, but each also has unique elements. Unlike a relational database where every record contains the same fields, leaving unused fields empty; there are no empty 'fields' in either document (record) in the above example. This approach allows new information to be added to some records without requiring that every other record in the database share the same structure.

Keys[edit]

Documents are addressed in the database via a unique key that represents that document. This key is often a simple string, a URI, or a path. The key can be used to retrieve the document from the database. Typically, the database retains an index on the key to speed up document retrieval.

Retrieval[edit]

Another defining characteristic of a document-oriented database is that, beyond the simple key-document (or key-value) lookup that can be used to retrieve a document, the database offers an API or query language that allows the user to retrieve documents based on their content. For example, you may want a query that retrieves all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to the next.

Organization[edit]

Implementations offer a variety of ways of organizing documents, including notions of

  • Collections
  • Tags
  • Non-visible Metadata
  • Directory hierarchies
  • Buckets

Implementations[edit]

Name Publisher License Language Notes RESTful API
ArangoDB triAGENS Apache License 2.0 C, C++ & Javascript A distributed multi model, high-performance document store and graph database. Yes [2]
BaseX BaseX Team BSD License Java, XQuery Support for XML, JSON and binary formats; client-/server based architecture; concurrent structural and full-text searches and updates; REST APIs. Yes
Cassandra Apache Software Foundation Apache License Java JSON over HTTP Yes
Cloudant Cloudant, Inc. Proprietary Erlang, Java, Scala, and C Distributed database service based on BigCouch, the company's open source fork of the Apache-backed CouchDB project. Yes
Clusterpoint Clusterpoint Ltd. Free community license / Commercial[3] C++ Schema-free, document-oriented database management system platform with server based data storage, full text search engine functionality, information ranking for search relevance and clustering. Yes
Couchbase Server Couchbase, Inc. Apache License Erlang and C Distributed NoSQL Document Database. Yes [4]
CouchDB Apache Software Foundation Apache License Erlang JSON over REST/HTTP with Multi-Version Concurrency Control and limited ACID properties. Uses map and reduce for views and queries.[5] Yes [6]
eXist eXist, [1] GPL XQuery, Java XML over REST/HTTP, WebDAV, Lucene Fulltext search, validation, versioning, clustering, triggers, URL rewriting, collections, ACLS, XQuery Update Yes [7]
FleetDB FleetDB MIT License Clojure A JSON-based schema-free database optimized for agile development. (unknown)
Jackrabbit Apache Software Foundation Apache License Java (unknown)
Informix IBM Proprietary Various (Compatible with MongoDB API) RDBMS with JSON, replication, sharding and ACID compliance (unknown)
Inquire Infodata Systems, Inc. Proprietary unknown In the mid-80's this was the dominant document-oriented commercial database, widely successful. The company seems to have gone out of business in 2005. (unknown)
Lotus Notes IBM Proprietary LotusScript, Java, Lotus @Formula (unknown)
MarkLogic MarkLogic Corporation Free Developer license or Commercial REST, Java, XQuery, XSLT, C++ Distributed document-oriented database with Multi-Version Concurrency Control, integrated Full text search and ACID-compliant transaction semantics Yes
MongoDB MongoDB, Inc GNU AGPL v3.0[8] C++ Document database with replication and sharding Optional [9]
MUMPS Database[10] Proprietary and Affero GPL[11] MUMPS Commonly used in health applications. (unknown)
OrientDB Orient Technologies Apache License Java JSON over HTTP Yes
RavenDB Hibernating Rhinos LTD Proprietary and modified Affero GPL[12] C#, JavaScript Yes
Redis BSD License ANSI C Key-value store supporting lists and sets with binary-safe protocol (unknown)
RethinkDB GNU APGL for the DBMS, Apache 2 License for the client drivers C++ (unknown)
Rocket U2 Rocket Software Proprietary UniData, UniVerse Yes (Beta)
Sqrrl Enterprise sqrrl Proprietary Java Distributed, real-time database featuring cell-level security and massive scalability. Yes


XML database implementations[edit]

Most XML databases are document-oriented databases.

See also[edit]

References[edit]

Further reading[edit]