||This article appears to be written like an advertisement. (May 2014)|
|Developer(s)||Evgeniy Polakov and Yandex|
Elliptics is a distributed key-value data storage with open source code. With default behavior, it is a classic distributed hash table (DHT). It does not require special control nodes, so it does not contain any single point of failure.
||This article may be too technical for most readers to understand. (May 2014)|
Initially Elliptics was created in 2007 as a part of POHMELFS, a cache coherent distributed file system developed by Russian Linux programmer Evgeniy Polyakov. POHMELFS was announced on January 31, 2008, and merged into the staging area of the Linux kernel source tree in version 2.6.30, released June 9, 2009. The filesystem went practically unused and was removed again in February 2012.
In 2008 Elliptics separated as an independent project. Polyakov tried different approaches to distributed data storage systems, some of them were not suitable because of their complexity and some of them were too far from a real life.[clarification needed] Eventually, Elliptics has become to what it is now[when?] - a mix of DHT, updated in parallel eventually consistent replicas, multiple layers from low-level on-disk stores up to SLRU caches and dynamic routing protocol. Now[when?] Elliptics is used in many projects[which?] as well as in Yandex infrastructure. In 2012 Polyakov announced a new version of POHMELFS based on Elliptics.
Elliptics clients connect directly to all storage servers which helps to:
- Execute lookup in O(1) network requests
- Run write/update into multiple replicas in parallel
There are several APIs for data access:
- Asynchronous feature/promise C++ library;
- Python binding;
- HTTP-proxy named Rift with buckets and ACLs based on TheVoid library (using boost::asio)
- Community-driven Erlang and Go bindings
- Distributed hash tables, no metadata servers, true horizontal scaling
- Data replication – replicas can be stored in different physical locations and no Amazon-like region outages are ever possible
- Range and bulk requests
- Different I/O storage backends, modular architecture which allows to implement own low-level storage
- Automatic data repartitioning in case of removed or added nodes
- Ring addressing structure, ability to implement own key generation models
- Cluster statistics gathering
- IO notifications support for any object in the network
- HTTP frontend, C/C++ and Python bindings, async, bulk and range operations
- Server-side atomic execution support (write trigger analog)
- Secondary indexes support
- Distributed SLRU cache with TTL
- P2P streaming support (eblob and file backends only - external applications like Nginx web server can stream data from eblob object files directly to clients without proxying)
Problems and restrictions
- Eventual consistency. As Elliptics is fully distributed in case of emergency server can possibly return a file copy which is older than an actual one. Sometimes it can be unacceptable. In these cases due to time loses it is better to use more reliable ways of data request.
- Network between client and servers can become a weak point as data is written on several servers simultaneously.
- API may be inconvenient for high-level requests. Elliptics does not provide its users with SQL-like data requests.
- Elliptics does not have high-level transactions support that is why it is impossible to guarantee that a command group will be fully executed or will not be executed at all.
- Transactions are atomic only within group and are locked based on primary key.
Elliptics and its supporting projects are being documented in wiki. It contains high-level design docs, tutorial, low-level details and knowledge base. Elliptics and related projects are discussed in open Google group.