Voldemort (distributed data store)
||This article appears to be written like an advertisement. (August 2013)|
||The neutrality of this article is disputed. (August 2013)|
|Stable release||1.3.0 / March 19, 2013|
|License||Apache License 2|
Voldemort is still under development. It is neither an object database, nor a relational database. It does not try to satisfy arbitrary relations and the ACID properties, but rather is a big, distributed, fault-tolerant, persistent hash table. A 2012 study comparing systems for storing APM monitoring data reported that Voldemort, Cassandra, and HBase offered linear scalability in most cases, with Voldemort having the lowest latency and Cassandra having the highest throughput.
In the parlance of Eric Brewer’s CAP theorem, Voldemort is an AP type system.
- It combines in-memory caching with the storage system so that a separate caching tier is not required (instead the storage system itself is just fast)
- It is possible to emulate the storage layer, as it is completely mockable. This makes the development and the unit testing easy, as it can be done against a throw-away in-memory storage system without the need for a real cluster or real storage system
- Reads and writes scale horizontally
- Simple API: The API decides data replication and placement and accommodates a wide range of application-specific strategies
- Transparent data partitioning: This allows for cluster expansion without rebalancing all data
The Voldemort distributed data store has following properties:
- Data placement: Support for pluggable data placement strategies exists to support things like distribution across data centers that are far apart.
- Data replication: The data is automatically replicated over a large number of servers.
- Data partitioning: The data is automatically partitioned so that the server contains only a subset of the total data
- Good single node performance: 10–20k operations per second can occur depending on the machines, the network, the disk system, and the data replication factor
- Node independence: Each node is independent of other nodes with no central point of failure or coordination
- Pluggable serialization: This allows rich keys and values including lists and tuples with named fields, as well as the integration with common serialisation frameworks. Examples for these frameworks are Avro, Java Serialization, Protocol Buffers, and Thrift
- Transparent failures: Server failures are handled transparently so that the user doesn't see such problems
- Versioning: The data items are versioned to maximize data integrity in case of failure without compromising availability of the system
- "Voldemort is a distributed key-value storage system". http://project-voldemort.com/: Project Voldemort - A distributed database. Retrieved 2011-04-05.
- "Comparison to relational databases". http://project-voldemort.com/: Project Voldemort - A distributed database. Retrieved 2011-04-05.
- Rabl, Tilmann; Sadoghi, Mohammad; Jacobsen, Hans-Arno; Gómez-Villamor, Sergio; Muntés-Mulero, Victor; Mankovskii, Serge (August 2012). "Solving Big Data Challenges for Enterprise Application Performance Management" (pdf). Proceedings of the VLDB Endowment 5 (12): 1724–1735.
- Serving Large-scale Batch Computed Data with Project Voldemort