Jump to content

MongoDB

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 31.221.91.66 (talk) at 10:20, 29 July 2014 (→‎Language support: - rm irrelevant mention of some language). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

MongoDB
Developer(s)MongoDB Inc.
Initial release2009 (2009)
Stable release
2.6.3 / 20 June 2014 (2014-06-20)
Repository
Written inC++
Operating systemCross-platform
Available inEnglish
TypeDocument-oriented database
LicenseGNU AGPL v3.0 (drivers: Apache license)
Websitewww.mongodb.org

MongoDB (from "humongous") is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. Released under a combination of the GNU Affero General Public License and the Apache License, MongoDB is free and open-source software.

First developed by the software company 10gen (now MongoDB Inc.) in October 2007 as a component of a planned platform as a service product, the company shifted to an open source development model in 2009, with 10gen offering commercial support and other services.[1] Since then, MongoDB has been adopted as backend software by a number of major websites and services, including Craigslist, eBay, Foursquare, SourceForge, Viacom, and the New York Times, among others. MongoDB is the most popular NoSQL database system.[2]

History

Development of MongoDB began in 2007, when the company (then named 10gen) was building a platform as a service similar to Windows Azure or Google App Engine.[3] In 2009, MongoDB was open sourced as a stand-alone product[4] with an AGPL license.

From version 1.4 (March 2010) onward, MongoDB has been considered production ready.[5] The latest stable version, 2.6, was released on April 8, 2014.

Licensing and support

MongoDB is available for free under the GNU Affero General Public License.[4] The language drivers are available under an Apache License. In addition, MongoDB Inc. offers commercial licenses for MongoDB.[1]

Main features

Some of the main features include:[6]

Ad hoc queries
MongoDB supports search by field, range queries, regular expression searches. Queries can return specific fields of documents and also include user-defined JavaScript functions.
Indexing
Any field in a MongoDB document can be indexed (indices in MongoDB are conceptually similar to those in RDBMSes). Secondary indices are also available.
Replication
MongoDB provides high availability with replica sets.[7] A replica set consists of two or more copies of the data. Each replica set member may act in the role of primary or secondary replica at any time. The primary replica performs all writes and reads by default. Secondary replicas maintain a copy of the data on the primary using built-in replication. When a primary replica fails, the replica set automatically conducts an election process to determine which secondary should become the primary. Secondaries can also perform read operations, but the data is eventually consistent by default.
Load balancing
MongoDB scales horizontally using sharding.[8] The user chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more slaves.)
MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure. Automatic configuration is easy to deploy, and new machines can be added to a running database.
File storage
MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files.
This function, called GridFS,[9] is included with MongoDB drivers and available with no difficulty for development languages (see "Language Support" for a list of supported languages). MongoDB exposes functions for file manipulation and content to developers. GridFS is used, for example, in plugins for NGINX[10] and lighttpd.[11] Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, and stores each of those chunks as a separate document.[12]
In a multi-machine MongoDB system, files can be distributed and copied multiple times between machines transparently, thus effectively creating a load-balanced and fault-tolerant system.
Aggregation
MapReduce can be used for batch processing of data and aggregation operations. The aggregation framework enables users to obtain the kind of results for which the SQL GROUP BY clause is used.
Server-side JavaScript execution
JavaScript can be used in queries, aggregation functions (such as MapReduce), and sent directly to the database to be executed.
Capped collections
MongoDB supports fixed-size collections called capped collections. This type of collection maintains insertion order and, once the specified size has been reached, behaves like a circular queue.

Criticisms

Prior to November 2012, MongoDB's default consistency model ("write concern") acknowledged writes as soon as they had entered the client's outgoing queue,[13] meaning that the default setup was brittle against client crashes.

MongoDB uses a readers-writer lock that allows concurrent read access to a database but exclusive write access to a single write operation. Before version 2.2, this lock was implemented on a per-mongod basis. Since version 2.2, the lock has been implemented at the database level.[14] One approach to increase concurrency is to use sharding.[15] In some situations, reads and writes will yield their locks. If MongoDB predicts a page is unlikely to be in memory, operations will yield their lock while the pages load. The use of lock yielding expanded greatly in 2.2.[16]

Another criticism related to the limitations of MongoDB when used on 32-bit systems.[17] In some cases, this was due to inherent memory limitations.[18] MongoDB recommends 64-bit systems and that users provide sufficient RAM for their working set. Some users encounter issues when their working set exceeds available RAM and the system encounters page faults. MongoHQ, a provider of managed MongoDB infrastructure, recommends a scaling checklist for large systems.[19]

Additionally, MongoDB does not support collation-based sorting and is limited to byte-wise comparison via memcmp,[20] which will not provide correct ordering for many non-English languages[21] when used with a Unicode encoding.

Language support

MongoDB has official drivers for a variety of popular programming languages and development environments.[22] There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.[22]

Management and graphical front-ends

MongoDB tools

In a MongoDB installation the following commands are available:

mongo
MongoDB offers an interactive shell called mongo,[23] which lets developers view, insert, remove, and update data in their databases, as well as get replication information, set up sharding, shut down servers, execute JavaScript, and more.
Administrative information can also be accessed through a web interface,[24] a simple webpage that serves information about the current server status. By default, this interface is 1000 ports above the database port (28017).
mongostat
mongostat[25] is a command-line tool that displays a summary list of status statistics for a currently running MongoDB instance: how many inserts, updates, removes, queries, and commands were performed, as well as what percentage of the time the database was locked and how much memory it is using. This tool is similar to the UNIX/Linux vmstat utility.
mongotop
mongotop[26] is a command-line tool providing a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on the per-collection level. By default, mongotop returns values every second. This tool is similar to the UNIX/Linux top utility.
mongosniff
mongosniff[27] is a command-line tool providing a low-level tracing/sniffing view into database activity by monitoring (or "sniffing") network traffic going to and from MongoDB. mongosniff requires the Libpcap network library and is only available for Unix-like systems. A cross-platform alternative is the open source Wireshark packet analyzer which has full support for the MongoDB wire protocol.
mongoimport, mongoexport
mongoimport[28] is a command-line utility to import content from a JSON, CSV, or TSV export created by mongoexport[29] or potentially other third-party data exports.
mongodump, mongorestore
mongodump[30] is a command-line utility for creating a binary export of the contents of a Mongo database; mongorestore[31] can be used to reload a database dump.

Popularity

According to db-engines.com, in April 2014, MongoDB is in 5th place as the most popular type of database management system, and first place for NoSQL database management systems.[2]

Comparison to similar technologies

Redis

The main point behind Redis is its performance, which is due to storing all its contents in memory. This fact means, however, that Redis is best used for rapidly changing data with a foreseeable database size. If you plan to keep data for long, then it may not be the best solution.

Apache Cassandra

Cassandra is acclaimed for being able to store huge datasets in a distributed manner while keeping the friendly interface up. This alternative allows creation of secondary indexes, querying by key or key range and writing triggers in Java (which is also the language used to write its core), meaning that the change for someone coming from a SQL database is not as tough as it would otherwise be. Cassandra also relies in an asynchronous masterless replication system, which means there is no single point of failure.

CouchDB

In opposite direction to Mongo, CouchDB guarantees consistency in the database, as every write operation is immediately passed down on to the disk. Moreover, by using its Multi-Version Concurrency Control (MVCC), avoids database lock during writes. This alternative relies a lot in ease of use (hence the name and logo) by allowing creation of views and providing built-in authentication. Replication is based in a master-master schema.

Other solutions

Although the popularity of NoSQL databases is only now starting to grow, the competition is fierce. There are many other solutions such as Riak, Accumulo, or Neo4j which are (arguably) less popular. The main point of this analysis is to introduce that each of the alternatives has its strengths and weaknesses; finding the correct one is no easy task.

Production deployments

Some of the prominent users of MongoDB include:

  • Large-scale deployments of MongoDB are tracked by MongoDB Inc.[32]
  • MetLife uses MongoDB for “The Wall", a customer service application providing a "360-degree view" of MetLife customers.[33] billion documents in MongoDB.[34]
  • SAP uses MongoDB in the SAP PaaS.[35]
  • Forbes stores articles and companies data in MongoDB.[36]
  • The New York Times uses MongoDB in its form-building application for photo submissions.[37]
  • Sourceforge uses MongoDB for its back-end storage pages.[38]
  • Codecademy uses MongoDB as the datastore for its online learning system.[39]
  • Shutterfly uses MongoDB for its photo platform. As of 2013, the photo platform stores 18 billion photos uploaded by Shutterfly's 7 million users.[40][41]
  • The Guardian uses MongoDB for its identity system.[42]
  • CERN uses MongoDB as the primary back-end for the Data Aggregation System for the Large Hadron Collider.[43]
  • Foursquare deploys MongoDB on Amazon AWS to store venues and user check-ins into venues.[44]
  • eBay uses MongoDB in the search suggestion and the internal Cloud Manager State Hub.[45]

See also

References

  1. ^ a b "10gen embraces what it created, becomes MongoDB Inc". Gigaom. Retrieved 27 August 2013.
  2. ^ a b "Popularity ranking of 216 database management systems". db-engines.com. Solid IT. Retrieved 26 April 2014.
  3. ^ MongoDB daddy: My baby beats Google BigTable
  4. ^ a b The MongoDB NoSQL Database Blog, The AGPL
  5. ^ The MongoDB NoSQL Database Blog, MongoDB 1.4 Ready for Production
  6. ^ MongoDB Developer Manual
  7. ^ [1]
  8. ^ [2]
  9. ^ GridFS article on MongoDB Developer's Manual
  10. ^ NGINX plugin for MongoDB source code
  11. ^ lighttpd plugin for MongoDB source code
  12. ^ Expertstown - MongoDB overview
  13. ^ "Default Write Concern Change". MongoDB Release Notes. Retrieved April 17, 2014.
  14. ^ FAQ Concurrency - How Granular Are Locks
  15. ^ FAQ Concurrency - How Does Sharding Affect Concurrency
  16. ^ FAQ Concurrency - Do Operations Ever Yield the Lock
  17. ^ 32-bit Limitations
  18. ^ Does Everybody Hate MongoDB
  19. ^ Optimizing Your MongoDB Dataset
  20. ^ "memcmp". cppreference.com. 31 May 2013. Retrieved 26 April 2014.
  21. ^ MongoDB Jira ticket 1920
  22. ^ a b "MongoDB Drivers and Client Libraries". Mongodb.org. Retrieved 2013-07-08.
  23. ^ mongo - The Interactive Shell
  24. ^ HTTP Console
  25. ^ mongostat Manual
  26. ^ mongotop Manual
  27. ^ mongosniff Manual
  28. ^ mongoimport Manual
  29. ^ mongoexport Manual
  30. ^ mongodump Manual
  31. ^ mongorestore Manual
  32. ^ http://www.mongodb.com/scale
  33. ^ [http://www.informationweek.com/software/information-management/metlife-uses-nosql-for-customer-service/240154741
  34. ^ Lessons Learned from Migrating 2+ Billion Documents at Craigslist
  35. ^ The Quest to Understand the Use of MongoDB in the SAP PaaS
  36. ^ Supporting Distributed Global Workforce of Contributors with MongoDB
  37. ^ NYT + MongoDB in Production
  38. ^ Scaling SourceForge with MongoDB
  39. ^ How Codecademy is Using MongoDB
  40. ^ Real World NoSQL: MongoDB at Shutterfly
  41. ^ Here's How We Think Of Shutterfly's Stock Value
  42. ^ MongoDB at The Guardian
  43. ^ Holy Large Hadron Collider, Batman!
  44. ^ Experiences Deploying MongoDB on AWS
  45. ^ MongoDB at eBay

Bibliography