Document-oriented database
A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information. Document-oriented databases are one of the main categories of so-called NoSQL databases and the popularity of the term "document-oriented database" (or "document store") has grown with the use of the term NoSQL itself.
Contents |
[edit] Documents
The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format(s) (or encoding(s)). Encodings in use include XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).
Documents inside a document-oriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like. For example here's a document:
- FirstName:"Bob", Address:"5 Oak St.", Hobby:"sailing".
Another document could be:
- FirstName:"Jonathan", Address:"15 Wanamassa Point Road", Children:[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}].
Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it does not require explicitly stating if other pieces of information are left out.
[edit] Keys, Retrieval, and Organization
[edit] Keys
Documents are addressed in the database via a unique key that represents that document. Often, this key is a simple string. In some cases, this string is a URI or path. Regardless, you can use this key to retrieve the document from the database. Typically, the database retains an index on the key such that document retrieval is fast.
[edit] Retrieval
One of the other defining characteristics of a document-oriented database is that, beyond the simple key-document (or key-value) lookup that you can use to retrieve a document, the database will offer an API or query language that will allow you to retrieve documents based on their contents. For example, you may want a query that gets you all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to the next.
[edit] Organization
Implementations offer a variety of ways of organizing documents, including notions of
- Collections
- Tags
- Non-visible Metadata
- Directory hierarchies
[edit] Implementations
| Name | Publisher | License | Language | Notes | RESTful API |
|---|---|---|---|---|---|
| Lotus Notes | IBM | Proprietary | LotusScript, Java, Lotus @Formula | (unknown) | |
| askSam | askSam Systems | Proprietary | (unknown) | ||
| Apstrata | Apstrata | Proprietary | (unknown) | ||
| BaseX | BaseX Team | BSD License | Java, XQuery | Yes | |
| Datawasp | Significant Data Systems | Proprietary | (unknown) | ||
| Clusterpoint | Clusterpoint Ltd. | Free community license / Commercial[1] | C++ | Scalable, high-performance, schema-free, document-oriented database management system platform with server based data storage, fast full text search engine functionality, information ranking for search revelevance and clustering. | Yes |
| CRX | Day Software | Proprietary | (unknown) | ||
| MUMPS Database[2] | Proprietary and GNU Affero GPL[3] | MUMPS | Commonly used in health applications. | (unknown) | |
| UniVerse | Rocket Software | Proprietary | Yes (Beta) | ||
| UniData | Rocket Software | Proprietary | Yes (Beta) | ||
| Jackrabbit | Apache Software Foundation | Apache License | Java | (unknown) | |
| CouchDB | Couchbase, Apache Software Foundation | Apache License | Erlang | JSON over REST/HTTP with Multi-Version Concurrency Control and limited ACID properties. Uses map and reduce for views and queries.[4] | Yes (there is only RESTful API)[5] |
| FleetDB | FleetDB | MIT License | Clojure | A JSON-based schema-free database optimized for agile development. | (unknown) |
| MarkLogic | MarkLogic Corportation | Free Express license or Commercial | Implemented in C++, with external interfaces in Java and .NET. Stored procedures are written in XQuery and/or XSLT | Fast, scalable, distributed, enterprise-grade document-oriented database with Multi-Version Concurrency Control, integrated Full text search and ACID-compliant transaction semantics | Yes, via Corona, which supports JSON, XML, text, and binary encoded documents |
| MongoDB | 10gen, Inc | GNU AGPL v3.0[6] | C, C++, Erlang, Haskell, Java, Javascript, .NET (C# F#, PowerShell, etc), Perl, PHP, Python, Ruby, Scala | Fast, document-oriented database optimized for highly transient data. | Optional using external tools[7] |
| GemFire Enterprise [1] | VMWare | Commercial | Java, .NET, C++ | Memory-oriented, fast, key-value database with indexing and querying support. | Yes |
| OrientDB | Orient Technologies | Apache License | Java | JSON over HTTP | Yes |
| RavenDB | RavenDB | commercial or GNU AGPL v3.0 | .NET | A .NET LINQ-enabled Document Database, focused on providing high performance, transactional, schema-less, flexible and scalable NoSQL data store for the .NET and Windows platforms. | Yes |
| Redis | BSD License | ANSI C | Key-value store supporting lists and sets with fast, simple and binary-safe protocol. | (unknown) | |
| StrokeDB | [2] | MIT License | Alpha software. | (unknown) | |
| Terrastore | Apache License | Java | JSON/HTTP | (unknown) | |
| ThruDB | BSD License | C++, Java | Built on top of Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Alternate implementation is being developed in Java. Alpha software. | (unknown) | |
| Persevere | Persevere | BSD License | A JSON database and JavaScript Application Server. Provides RESTful JSON interface for Create, read, update, and delete access to data. Also supports JSONQuery/JSONPath querying. | Yes | |
| DBSlayer | DBSlayer | Apache License | C | database abstraction layer (over MySQL) used by the New York Times. JSON over HTTP. | (unknown) |
| Eloquera DB | Eloquera | Proprietary | .NET | High performance. Based on Dynamic objects. Supports LINQ, SQL queries. | (unknown) |
[edit] XML database implementations
Most XML databases are document-oriented databases.
[edit] See also
- Internet Message Access Protocol (IMAP)
- Database theory
- In-memory database
- NoSQL
- Object database
- Online database
- Real time database
- Relational database
- Data hierarchy
[edit] References
[edit] Further reading
- Assaf Arkin. (2007, September 20). Read Consistency: Dumb Databases, Smart Services. Labnotes:Don’t let the bubble go to your head!
[edit] External links
|
|||||||||||
|
||||||||||||||||||||