Voldemort (distributed data store)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Project Voldemort
Developer(s) LinkedIn
Initial release 2009
Stable release 1.6.0 / January 31, 2014 (2014-01-31)
Development status Active
Written in Java
Operating system Cross-platform
Available in English
Type key-value store
License Apache License 2
Website project-voldemort.com

Voldemort is a distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage.[1] It is named after the fictional Harry Potter villain Lord Voldemort.

Voldemort is still under development. It is neither an object database, nor a relational database. It does not try to satisfy arbitrary relations and the ACID properties, but rather is a big, distributed, fault-tolerant, persistent hash table.[2] A 2012 study comparing systems for storing APM monitoring data reported that Voldemort, Cassandra, and HBase offered linear scalability in most cases, with Voldemort having the lowest latency and Cassandra having the highest throughput.[3]

In the parlance of Eric Brewer’s CAP theorem, Voldemort is an AP type system.

Advantages[edit]

Voldemort offers a number of advantages over other databases:[2] [4]

  • It combines in-memory caching with the storage system so that a separate caching tier is not required (instead the storage system itself is just fast)
  • It is possible to emulate the storage layer, as it is completely mockable. This makes the development and the unit testing easy, as it can be done against a throw-away in-memory storage system without the need for a real cluster or real storage system
  • Reads and writes scale horizontally
  • Simple API: The API decides data replication and placement and accommodates a wide range of application-specific strategies
  • Transparent data partitioning: This allows for cluster expansion without rebalancing all data

Properties[edit]

The Voldemort distributed data store has following properties:[1]

  • Data placement: Support for pluggable data placement strategies exists to support things like distribution across data centers that are far apart.
  • Data replication: The data is automatically replicated over a large number of servers.
  • Data partitioning: The data is automatically partitioned so that the server contains only a subset of the total data
  • Good single node performance: 10–20k operations per second can occur depending on the machines, the network, the disk system, and the data replication factor
  • Node independence: Each node is independent of other nodes with no central point of failure or coordination
  • Pluggable serialization: This allows rich keys and values including lists and tuples with named fields, as well as the integration with common serialisation frameworks. Examples for these frameworks are Avro, Java Serialization, Protocol Buffers, and Thrift
  • Transparent failures: Server failures are handled transparently so that the user doesn't see such problems
  • Versioning: The data items are versioned to maximize data integrity in case of failure without compromising availability of the system

See also[edit]

References[edit]

  1. ^ a b "Voldemort is a distributed key-value storage system". http://project-voldemort.com/: Project Voldemort - A distributed database. Retrieved 2011-04-05. 
  2. ^ a b "Comparison to relational databases". http://project-voldemort.com/: Project Voldemort - A distributed database. Retrieved 2011-04-05. 
  3. ^ Rabl, Tilmann; Sadoghi, Mohammad; Jacobsen, Hans-Arno; Gómez-Villamor, Sergio; Muntés-Mulero, Victor; Mankovskii, Serge (August 2012). "Solving Big Data Challenges for Enterprise Application Performance Management" (pdf). Proceedings of the VLDB Endowment 5 (12): 1724–1735. 
  4. ^ Serving Large-scale Batch Computed Data with Project Voldemort

External links[edit]