MongoDB
|
|
This article has multiple issues. Please help improve it or discuss these issues on the talk page.
|
| Developer(s) | 10gen |
|---|---|
| Initial release | 2009 |
| Stable release | 2.4.4 / 4 June 2013 |
| Development status | Active |
| Written in | C++ |
| Operating system | Cross-platform |
| Available in | English |
| Type | Document-oriented database |
| License | GNU AGPL v3.0 (drivers: Apache license) |
| Website | www.mongodb.org |
MongoDB (from "humongous") is an open source document-oriented database system developed and supported by 10gen. It is part of the NoSQL family of database systems. Instead of storing data in tables as is done in a "classical" relational database, MongoDB stores structured data as JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.
10gen began development of MongoDB in October 2007. The database is used by MTV Networks,[1] Craigslist,[2] Foursquare[3] and UIDAI Aadhaar.[4] MongoDB is the most popular NoSQL database management system.[5]
Binaries are available for Windows, Linux, OS X, and Solaris.
Contents |
History[edit]
Development of MongoDB began at 10gen in 2007, when the company was building a platform as a service similar to Windows Azure or Google App Engine.[6] In 2009, MongoDB was open sourced as a stand-alone product[7] with an AGPL license.
In March 2010, from version 1.4, MongoDB has been considered production ready.[8]
The latest stable version, 2.4.0, was released in March 2013.
Licensing and support[edit]
MongoDB is available for free under the GNU Affero General Public License.[7] The language drivers are available under an Apache License. In addition, 10gen offers commercial licenses for MongoDB.[9]
Main features[edit]
The following is a brief summary of some of the main features:
- Ad hoc queries
- MongoDB supports search by field, range queries, regular expression searches. Queries can return specific fields of documents and also include user-defined JavaScript functions.
- Indexing
- Any field in a MongoDB document can be indexed (indices in MongoDB are conceptually similar to those in RDBMSes). Secondary indices are also available.
- Replication
- MongoDB supports master-slave replication. A master can perform reads and writes. A slave copies data from the master and can only be used for reads or backup (not writes). The slaves have the ability to select a new master if the current one goes down.
- Load balancing
- MongoDB scales horizontally using sharding.[10] The developer chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more slaves.)
- MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure. Automatic configuration is easy to deploy and new machines can be added to a running database.
- File storage
- MongoDB could be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files.
- This function, called GridFS,[11] is included with MongoDB drivers and available with no difficulty for development languages (see "Language Support" for a list of supported languages). MongoDB exposes functions for file manipulation and content to developers. GridFS is used, for example, in plugins for NGINX.[12] and lighttpd[13]
- In a multi-machine MongoDB system, files can be distributed and copied multiple times between machines transparently, thus effectively creating a load balanced and fault tolerant system.
- Aggregation
- MapReduce can be used for batch processing of data and aggregation operations. The aggregation framework enables users to obtain the kind of results for which the SQL GROUP BY clause is used.
- Server-side JavaScript execution
- JavaScript can be used in queries, aggregation functions (such as MapReduce), are sent directly to the database to be executed.
- Capped collections
- MongoDB supports fixed-size collections called capped collections. This type of collection maintains insertion order and, once the specified size has been reached, behaves like a circular queue.
For further information on the points listed look up the MongoDB Developer Manual
Use cases and production deployments[edit]
MongoDB is well suited for the following cases:[14]
- Archiving and event logging
- Document and Content Management Systems. As a document-oriented (JSON) database, MongoDB's flexible schemas are a good fit for this.
- E-commerce. Several sites are using MongoDB as the core of their ecommerce infrastructure (often in combination with an RDBMS for the final order processing and accounting).
- Gaming. High performance small read/writes are a good fit for MongoDB; also for certain games geospatial indexes can be helpful.
- High volume problems. Problems where a traditional DBMS might be too expensive for the data in question. In many cases developers would traditionally write custom code to a filesystem instead using flat files or other methodologies.
- Mobile. Specifically, the server-side infrastructure of mobile systems. Geospatial indexes are key here.
- Operational data store of a web site. MongoDB is very good at real-time inserts, updates, and queries. Scalability and replication are provided which are necessary functions for large web sites' real-time data stores. Specific web use case examples:
- content management
- comment storage, management, voting
- user registration, profile, session data
- Projects using iterative/agile development methodologies. Mongo's BSON data format makes it very easy to store and retrieve data in a document-style / "schemaless" format. Addition of new properties to existing objects is easy and does not generally require blocking "ALTER TABLE" style operations.
- Real-time stats/analytics
Enterprises that use MongoDB[edit]
Many enterprises use and have production deployments MongoDB. Examples are SAP AG, MTV, and Sourceforge.[15]
Data manipulation: collections and documents[edit]
MongoDB stores structured data as JSON-like documents, using dynamic schemas (called BSON), rather than predefined schemas. In MongoDB, an element of data is called a document, and documents are stored in collections. One collection may have any number of documents.
The arrangement of data in a MongoDB instance is innovative compared to traditional relational databases ("RDBMS", "relational database management system"). In an RDBMS, the data can be seen as organized in "tables", each of which consists of "records" (or "rows"), each of which consists of "fields". One of the essential characteristics of an RDBMS is that, within each table, every record has the same fields (with, usually, differing values) in the same order. This strict parallelism in organization of the data leads to all the parallel instances of a field taken together being called a "column".
Considering a MongoDB instance, we could say that collections are like tables, and documents are like records. But there is a big difference: any document in a collection can have completely different fields from the other documents. The only schema requirement MongoDB places on documents (aside from size limits) is that they must contain an '_id' field with a unique, non-array value.
A typical table in a relational database, accessible by SQL, could be represented on the page like this:
| Last Name | First Name | Date of Birth |
|---|---|---|
| DUMONT | Jean | 01-22-1963 |
| PELLERIN | Franck | 09-19-1983 |
| GANNON | Dustin | 11-12-1982 |
- Every record in an SQL-accessible table has the same fields, in the same order.
On the other hand, a typical MongoDB collection would look like this:
{ "_id": ObjectId("4efa8d2b7d284dad101e4bc9"), "Last Name": "DUMONT", "First Name": "Jean", "Date of Birth": "01-22-1963" }, { "_id": ObjectId("4efa8d2b7d284dad101e4bc7"), "Last Name": "PELLERIN", "First Name": "Franck", "Date of Birth": "09-19-1983", "Address": "1 chemin des Loges", "City": "VERSAILLES" }
- Each document in a MongoDB collection can have different fields from the other documents (Note: "_id" field is obligatory, automatically created by MongoDB; it's a unique index which identifies the document. Its value need not be the default MongoID type shown here—the user may specify any non-array value for _id as long as the value is unique. We can think of the "_id" value as the document’s primary key. Every document requires this value.[16]).
In a document, new fields can be added or existing ones suppressed, modified or renamed at any moment. There is no predefined schema. A document structure is very simple: it follows the JSON format, and consists of a series of key-value pairs, so that a document is the equivalent of the feature called in various computer languages "associative arrays", "maps", "dictionaries", "hash-tables" or "hashes". The key of the key-value pair is the name of the field, the value in the key-value pair is the field's content. The key and value are separated by ":", as shown.
A value can be a number; a string; true or false; binary data such as an image; an array of values (each of which can be of different type); or an entire subordinate document:
{ "_id": ObjectId("4efa8d2b7d284dad101e4bc7"), "Last Name": "PELLERIN", "First Name": "Franck", "Date of Birth": "09-19-1983", "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567", "verified": false } ], "Address": { "Street": "1 chemin des Loges", "City": "VERSAILLES" }, "Months at Present Address": 37 }
Here we can see that the field "Address" contains a subordinate document, which possesses two fields of its own, "Street" and "City".
Language support[edit]
MongoDB has official drivers for a variety of popular programming languages and development environments.[17] Web programming language Opa also has built-in support for MongoDB, which is tightly integrated in the language and offers a type-safety layer on top of MongoDB.[18] There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.[17]
HTTP/REST interfaces[edit]
There are REST and HTTP interfaces that allow the manipulation of MongoDB entries via HTTP GET, POST, UPDATE, and DELETE calls.
An overview of the available HTTP/REST interfaces can be found on the MongoDB website.
Management and graphical front-ends[edit]
MongoDB tools[edit]
In a MongoDB installation the following commands are available:
- mongo
- MongoDB offers an interactive shell called mongo,[19] which lets developers view, insert, remove, and update data in their databases, as well as get replication information, set up sharding, shut down servers, execute JavaScript, and more.
- Administrative information can also be accessed through a web interface,[20] a simple webpage that serves information about the current server status. By default, this interface is 1000 ports above the database port (28017).
- mongostat
- mongostat[21] is a command-line tool that displays a summary list of status statistics for a currently running MongoDB instance: how many inserts, updates, removes, queries, and commands were performed, as well as what percentage of the time the database was locked and how much memory it is using. This tool is similar to the UNIX/Linux vmstat utility.
- mongotop
- mongotop[22] is a command-line tool providing a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on the per-collection level. By default, mongotop returns values every second. This tool is similar to the UNIX/Linux top utility.
- mongosniff
- mongosniff[23] is a command-line tool providing a low-level tracing/sniffing view into database activity by monitoring (or "sniffing") network traffic going to and from MongoDB. mongosniff requires the Libpcap network library and is only available for Unix-like systems. A cross-platform alternative is the open source Wireshark packet analyzer which has full support for the MongoDB wire protocol.
- mongoimport, mongoexport
- mongoimport[24] is a command-line utility to import content from a JSON, CSV, or TSV export created by mongoexport[25] or potentially other third-party data exports. Usage information can be found in the MongoDB Manual's section on Importing and Exporting MongoDB Data.
- mongodump, mongorestore
- mongodump[26] is a command-line utility for creating a binary export of the contents of a Mongo database; mongorestore[27] can be used to reload a database dump. Data backup strategies and considerations are detailed in the MongoDB Manual's section on Backup and Restoration Strategies.
Monitoring plugins[edit]
There are MongoDB monitoring plugins available for the following network tools:
-
- mongo-munin: Plugin for Munin
- mongodb-ganglia: Plugin for ganglia
- MongoDB Cacti Graphs: Plugin for cacti
- MongoDB Slow Queries: Plugin for Scout
More monitoring and diagnostic tools for MongoDB are listed on MongoDB Admin Zone: Monitoring and Diagnostics
Cloud-based monitoring services[edit]
-
- MongoDB Monitoring Service (MMS) is a free, cloud-based monitoring and alerting solution for MongoDB deployments offered by 10gen, the company who develops MongoDB.
- Server Density is a cloud-based tool which helps you provision and monitor your infrastructure. It includes a custom dashboard for MongoDB, MongoDB specific alerts, replication failover timeline and iPhone, iPad and Android mobile apps.
Web and desktop application GUIs[edit]
Several GUIs have been created by MongoDB's developer community to help visualize their data. Some popular ones are:
Open source tools
- RockMongo: PHP-based MongoDB administration GUI tool
- phpMoAdmin: another PHP GUI that runs entirely from a single 95kb self-configuring file
- UMongo: a desktop application for all platforms.
- Mongo3: a Ruby-based interface.
- Meclipse: Eclipse plugin for interacting with MongoDB
- MongoHub: a freeware native Mac OS X application for managing MongoDB. Version for other operating systems is built on Titanium Desktop.
- mViewer: A simple web-based Administration and Management Tool for MongoDB written in Java.
- MongoDBPumper: a commercial high-performance data transfer solution to provide export and import functionality between Oracle and MongoDB databases.
More client tools for MongoDB are listed on MongoDB Administrator Manual
Business intelligence tools and solutions[edit]
- Jaspersoft BI: Java based Report Designer and Report Server that supports MongoDB
- Pentaho: MongoDB connectors for Pentaho Kettle and Pentaho BI
- RJMetrics: A hosted Business Intelligence Solution that supports MongoDB.
- eCommerce Analytics: eCommerce Analytics Software that supports MongoDB data analysis.
- Nucleon BI Studio: MS Windows based business intelligence software that supports MongoDB and other RDBMS.
Criticism[edit]
As with many NoSQL technologies MongoDB falls prey to its lack of compliance to the ACID paradigm, more specifically durability.
Write durability is a hot topic among critics of MongoDB, one which can span entire arguments.[28] Over the course of versions MongoDB has gone through iterations to improve durability in all of the main areas.
Journaling[edit]
One of the main durability concerns before version 1.8 (stable) of MongoDB was the lack of a journal which was released with version 1.8 (stable).[29]
This meant that single server durability of data before journaling was compromised which could leave MongoDB's data files in an inconsistent state on a single server. Versions after 1.8 allowed for the recovery of MongoDBs data files to a consistent state post unplanned halting of a system
That being said, MongoDB does not have multi-document transactions as such this cannot guarantee that you could rollback a multi-document update using the journal to a previous consistent state.
However, MongoDB's use of an fsync[30] queue to update the on disk data files still causes controversy; the main point being that to still ensure a consistent and up to date journal you must acknowledge every write with the journal otherwise a server could receive a write but lose it before it reaches the journal in the event of an unplanned shutdown.
Write Concern[edit]
The write concern determines how well drivers can ensure they have sent a durable write to the server.
The default level of write concern within drivers was fire and forget up until November, 2012.[31] Afterwards all drivers were updated[32] to reflect the new default of acknowledged writes.
This does still throw into question the durability of a write since this level of write concern does not decide if the document has been written to disk or not. Instead the new level of write concern waits for a mongod to return with an acknowledgement by polling with getLastError().
The notion of polling with getLastError() is also noted as a criticism.[33]
Within a replica set the write concern can effect how durable the write is. The new level of write concern would only acknowledge the write on one member (primary), lowering the durability of any write you send to a replica set. Even though the write would eventually be replicated to the other members it would take some time and the application would have no knowledge of this event. The use of a different write concern, otherwise known as replica acknowledged writes can solve this problem.
One of the main criticisms[33] is that the write is still not durable with this level of write concern (replica acknowledged) along with the added journal option set.
Aside from the fact that, by default, MongoDB uses an fsync queue to asynchronously update the data files it is noted that the journaled option, which exists to ensure a disk write irrespective of the fsync queue, will only ensure a disk write to the journal on the primary of a replica set. The other members will still only acknowledge the write as though it is acknowledged. It is argued that all nodes which receive the command should write to journal before an acknowledgement is returned if the write were to be durable.
Further reading[edit]
Many posts, by individuals, have been written about MongoDB on various areas including its shortcomings over the years. Some of the main ones include:
- Does everyone hate MongoDB? - a summary of and comments on recent criticism
- A Year with MongoDB
- Things I Wish I Knew About MongoDB a Year Ago
- A Year of MongoDB
- Armin Ronacher - A Year with MongoDB
- Broken by Design: MongoDB Fault Tolerance
See also[edit]
- Apache's Erlang-based CouchDB (open source)
- Apache's Java-based HBase (open source)
- Basho Riak (open source, Apache License 2.0)
- NoSQL, i.e., Structured storage
References[edit]
- ^ MongoDB Powering MTV's Web Properties
- ^ MongoDB live at craigslist
- ^ MongoDB at foursquare - Presentation at MongoNYC
- ^ Aadhaar: A Testimony to Success of FOSS in India! - LINUX For You
- ^ DB-Engines Ranking
- ^ MongoDB daddy: My baby beats Google BigTable
- ^ a b The MongoDB NoSQL Database Blog, The AGPL
- ^ The MongoDB NoSQL Database Blog, MongoDB 1.4 Ready for Production
- ^ MongoDB Support by 10gen
- ^ Article "Sharding" on MongoDB Administrator's Manual
- ^ GridFS article on MongoDB Developer's Manual
- ^ NGINX plugin for MongoDB source code
- ^ lighttpd plugin for MongoDB source code.
- ^ "Use Cases" article at MongoDB's web page
- ^ "Production Deployments" article on MongoDB web
- ^ "MongoDB tutorial video". Learn-with-video-tutorials. Retrieved 2013-04-07.
- ^ a b http://www.mongodb.org/display/DOCS/Drivers
- ^ https://github.com/MLstate/opalang/wiki/The-database
- ^ mongo - The Interactive Shell
- ^ HTTP Console
- ^ mongostat Manual
- ^ mongotop Manual
- ^ mongosniff Manual
- ^ mongoimport Manual
- ^ mongoexport Manual
- ^ mongodump Manual
- ^ mongorestore Manual
- ^ ycombinator - A Year of MongoDB
- ^ The MongoDB NoSQL Database Blog, MongoDB 1.8 Released
- ^ MongoDB Glossary - fsync
- ^ Introducing MongoClient
- ^ Default Write Concern Change
- ^ a b Broken by Design: MongoDB Fault Tolerance
Bibliography[edit]
- Banker, Kyle (March 28, 2011), MongoDB in Action (1st ed.), Manning, p. 375, ISBN 978-1-935182-87-0
- Chodorow, Kristina; Dirolf, Michael (September 23, 2010), MongoDB: The Definitive Guide (1st ed.), O'Reilly Media, p. 216, ISBN 978-1-4493-8156-1
- Pirtle, Mitch (March 3, 2011), MongoDB for Web Development (1st ed.), Addison-Wesley Professional, p. 360, ISBN 978-0-321-70533-4
- Hawkins, Tim; Plugge, Eelco; Membrey, Peter (September 26, 2010), The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing (1st ed.), Apress, p. 350, ISBN 978-1-4302-3051-9
External links[edit]
- Official website
- MongoDB on Facebook
- MongoDB on Twitter
- MongoDB with ZanPHP Spanish Documentation
- mongoDB User Group on LinkedIn
- MongoDB news and articles on myNoSQL
- Eric Lai. (2009, July 1). No to SQL? Anti-database movement gains steam
- Videos about MongoDB on MrBool.com
- MongoDB articles on NoSQLDatabases.com
- June 2009 San Francisco NOSQL Meetup Page
- Designing for the Cloud at MIT Technology Review
- EuroPython Conference Presentation
- Non-relational data persistence in Java using MongoDB - Software Engineer at MongoDB on YouTube
- Interview with Mike Dirolf on The Changelog about MongoDB background and design decisions
- MongoMvc - A MongoDB Demo App with ASP.NET MVC
- FAQs about MongoDB
- Is MongoDB a good alternative to RDBMs databases?
- SQL to Mongo Mapping Chart
- The Little MongoDB Book
- NoSQL Solution: Evaluation and Comparison: MongoDB vs Redis, Tokyo Cabinet, and Berkeley DB
- MongoDB tutorial video