Cosmos DB

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Azure Cosmos DB
Developer(s) Microsoft
Initial release 2010
Available in English
Type Multi-model database
Website cosmosdb.com

Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service "for managing data at planet-scale" launched in May 2017.[1] It builds upon and extends the earlier Azure DocumentDB, which was released in 2014.[2] It is schema-less and generally classified as a NoSQL database.

Dynamically tunable[edit]

With the current recommended option of "partitioned collection" type, Cosmos DB is dynamically tunable along three dimensions:

  1. Throughput. Developers reserve throughput of the service according to the application's varying load. Behind the scenes, Cosmos DB will scale up resources (memory, processor, partitions, replicas, etc.) to achieve that requested throughput while maintaining the 99.99th percentile of latency for reads to under 10 ms and for writes to under 15 ms. Throughput is specified in request units (RUs) per second. The number of RUs consumed for a particular operation varies based upon a number of factors, but the fetching of a single 1KB document by id spends roughly 1 RU. Delete, update, and insert operations consume roughly 5 RUs assuming 1 KB documents. Big queries and stored procedure executions can consume 100 s or 1000 s of RUs based upon the complexity of the operations needed.[3]
  2. Space. Similarly, Developers can specify how much storage they will need. Both space and throughput directly affect how much the user is charged but either can be tuned up dynamically to handle peak load and down to save costs when more lightly loaded.
  3. Consistency. Cosmos DB provides four consistency levels: strong, bounded-staleness, session, and eventual. The further to the left in this list, the greater the consistency but the higher the RU cost which essentially lowers available throughput for the same RU setting. Session level consistency is the default.[4] Even when set to lower consistency level, any arbitrary set of operations can be executed in an ACID-compliant transaction by performing those operations from within a stored procedure. You can also change the consistency level for each request using the x-ms-consistency-level request header or the equivalent option in your SDK.

Partitioning[edit]

Cosmos DB added automatic partitioning capability in 2016 with the introduction of partitioned collections. Behind the scenes, the collection will span multiple physical partitions with documents distributed by a caller-supplied partition key. Cosmos DB automatically decides how many partitions to spread your data across depending upon the size and throughput needs. When Cosmos DB decides to add (or remove) partitions, your data remains available while it is rebalanced across the new (or remaining) partitions.

Before partitioned collections were available it was common to write your own code to partition your data and some of the Cosmos DB SDKs explicitly supported several different partitioning schemes. That mode is still available but now only recommended when your needs will not exceed the capacity of one collection or when the built-in partitioning capability does not otherwise meet your needs.

Automatic indexing[edit]

By default, every field in each document is automatically indexed generally providing good performance without tuning to specific query patterns. These defaults can be modified by setting an indexing policy which can vary per field.

JavaScript[edit]

A JavaScript engine is embedded in Cosmos DB, which enables additional functionality:

  • Stored procedures. Functions that bundle an arbitrarily complex set of operations and logic into an ACID-compliant transaction. They are isolated from changes made while the stored procedure is executing and either all write operations succeed or they all fail, leaving the database in a consistent state. Stored procedures are executed in a single partition. Therefore, the caller must provide a partition key when calling into a partitioned collection. Stored procedures can be used to make up for the lack of certain functionality. For instance, the lack of aggregation capability is made up for by the implementation of an OLAP cube as a stored procedure in the open sourced documentdb-lumenize[5] project.
  • Triggers. Functions that get executed before or after specific operations (like on a document insertion for example) that can either alter the operation or cancel it. Triggers only execute them on request and not guaranteed to run.
  • User-defined functions (UDF). Functions that can be called from and augment the SQL query language making up for limited SQL support.

Supported environments[edit]

In the following environments, all features (except Direct Mode which is currently only supported for .NET) are explicitly supported with dedicated SDKs:

Additionally, Cosmos DB can be accessed with the following:

  • REST API. All features except Direct Mode are supported. You can call this REST API from any language or platform. In fact, the Node.js, Java, and Python SDKs are essentially thin wrappers calling this REST API.
  • MongoDB driver-level protocol support. Most features are implemented with two notable exceptions: 1) the low-level (undocumented?) API that allows applications like Meteor to install themselves as a replica and receive all changes as an event stream, and 2) aggregations.

Querying mechanisms[edit]

Several mechanisms for querying are provided:

  1. SQL-like query language with adjustments to match JSON data types. However, no joins or group-by are allowed.
  2. LINQ language integrated queries.
  3. JavaScript language integrated queries. This is only available from the server-side SDK exposed to stored procedures, triggers, and user-defined functions. It is modeled after the Underscore.js API.
  4. MongoDB query language (JSON) via the MongoDB driver-level protocol support.

Other features[edit]

Additionally, Cosmos DB has support for:

  • Global distribution.[7] Global distribution was added to Cosmos DB capability in 2016. This feature lets you scale your Cosmos DB instance across different regions around the world and define what type of consistency you expect between the regions, from strong to eventual. It is even possible to configure an automatic and transparent failover for a given region.
  • BLOB storage via a behind-the-scenes integration with Azure BLOB Storage. If an Azure Blob Storage instance doesn’t exist, one is automatically provisioned when the first write to blob storage is issued.
  • GeoJSON support for storing and querying geographical information.
  • 99.99% availability SLA for all single region database accounts, and all 99.999% read availability on all multi-region database accounts.

Reception[edit]

Gartner Research positions Microsoft as the leader in the Magic Quadrant Operational Database Management Systems in 2016[8] and explicitly calls out the unique capabilities of Cosmos DB in their write-up.

Real-world use cases[edit]

These Microsoft services utilize Cosmos DB[9]:

  • Office
  • Skype
  • Active Directory
  • Xbox
  • MSN

If you're looking to use Cosmos DB to build a more globally resilient application / system, you can combine it with other Azure services such as Azure App Service and Azure Traffic Manager[10].

Limitations, criticism and cautions[edit]

  • Limited backup/restore features. Whilst automated backups are taken, they are limited in duration (only last two backups are retained over an 8 hour period). Restoration of backups can only be achieved by raising a support ticket and awaiting Microsoft Support Team's assistance. Furthermore, whilst the backup facility does protect against accidental deletion of databases and whole collections, it offers very little protection against document-level corruption, due to the fact that there is no "point-in-time" restore option. These limiting factors mean that Cosmos DB may not satisfy the long-term data retention policies and requirements of many organisations.
  • Triggers must be explicitly specified for each operation that you wish to use them which renders them ineffective as a mechanism for maintaining business logic consistency unless you can be certain that all the correct triggers are specified for every operation.
  • .NET LINQ language integrated queries are not fully supported. More and more LINQ support has been added over time, but developers are often confused when the LINQ code that they use on other systems fails to work as expected on Cosmos DB as evidenced by the large number of StackOverflow questions containing both tags.[11]
  • Transactions are not currently supported at the API level, for example Cosmos DB does not participate in the .NET TransactionScope pattern. Transactions are only currently supported from within JavaScript stored procedures.
  • The lack of fully functioning local version. However, a local emulator running under MS Windows for developer desktop use was added in the fall of 2016.
  • SQL is very limited, offering no joins or aggregation capability. Aggreagations limited to COUNT, SUM, MIN, MAX, AVG functions but no support for GROUP BY or other aggregation functionality found in database systems. However, stored procedures can be used to implement in-the-database aggregation capability.
  • "Collection" means something different in Cosmos DB. It is simply a bucket of documents. There is a tendency to equate them to tables where each collection would hold only a single type of document which is not recommended with Cosmos DB. Rather, developers are encouraged to distinguish document types with a "type" field or by adding an "isTypeA = true" field to all documents of TypeA, "isTypeB = true" for all documents of Type B, etc. This is especially confusing to developers that are coming from MongoDB which has a "collection" entity that is intended to be used in a very different way.
  • The lack of query plan visibility (e.g. "EXPLAIN" keyword in SQL).
  • Support only for pure JSON data types. Most notably, Cosmos DB lacks support for date-time data requiring that you store this data using the available data types. For instance, it can be stored as an ISO-8601 string or epoch integer. MongoDB, the database to which Cosmos DB is most often compared, extended JSON in their BSON binary serialization specification to cover date-time data as well as traditional number types, regular expressions, and Undefined. However, many argue that Cosmos DB's choice of pure JSON is actually an advantage as it's a better fit for JSON-based REST APIs and the JavaScript engine built into the database.
  • Many developers have asked that Microsoft support real pagination through a Skip/Take mechanism. Skip/Take was first requested in late August 2014. To date, Microsoft keeps saying they're working on it.[12]

References[edit]

  1. ^ "Azure Cosmos DB". Microsoft Azure. Microsoft. Retrieved 9 July 2017. 
  2. ^ CrawCour, Ryan (21 August 2014). "Introducing Azure DocumentDB – Microsoft's fully managed NoSQL document database service". Retrieved 9 July 2017. 
  3. ^ syamkmsft. "How to manage an Azure Cosmos DB account". docs.microsoft.com. Retrieved 2017-08-22. 
  4. ^ syamkmsft. "Tunable data consistency levels in Azure Cosmos DB". docs.microsoft.com. Microsoft. Retrieved 2017-08-22. 
  5. ^ Maccherone, Larry. "Announcing documentdb-lumenize". blog.lumenize.com. Retrieved 2016-12-11. 
  6. ^ "Using Azure DocumentDB and ASP.NET Core for extreme NoSQL performance". auth0.com. 
  7. ^ kiratp. "How to distribute data globally with Azure Cosmos DB". docs.microsoft.com. Retrieved 2017-08-22. 
  8. ^ "Magic Quadrant for Operational Database Management Systems". www.gartner.com. Retrieved 2016-12-11. 
  9. ^ http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf
  10. ^ Pietschmann, Chris. "Building Globally Resilient Apps with Azure App Service and Cosmos DB". BuildAzure.com. Opsgility. Retrieved 30 January 2018. 
  11. ^ "Newest 'azure-documentdb' Questions". stackoverflow.com. Retrieved 2016-12-07. 
  12. ^ "[DocumentDB] Allow Paging (skip/take)". feedback.azure.com. Retrieved 2018-03-06.