|Developer(s)||Evgeniy Polakov with Yandex support|
|Written in||C++, Python, Go|
Elliptics is a distributed key-value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups (distributed hashes). Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files (1kb upto gigabytes in size, thousands to billions of objects).
Initially Elliptics was created in 2007 as a part of POHMELFS, a cache coherent distributed file system developed by Linux programmer Evgeniy Polyakov. POHMELFS was announced on January 31, 2008, and merged into the staging area of the Linux kernel source tree in version 2.6.30, released June 9, 2009. The filesystem went practically unused and was removed again in February 2012.
In 2008 Elliptics separated as an independent project. Polyakov tried different approaches to distributed data storage systems, some of them were not suitable because of their complexity and some of them were too far from a real life (BerkeleyDB, LevelDB, Kyoto Cabinet backends for medium and big files, different datacenters in a single DHT ring, non eventual recovery). Elliptics is eventually consistent system with multiple updated in parallel replicas potentially living in physically distributed locations. Elliptics contains multiple layers from low-level on-disk store (named Eblob) up to SLRU caches and dynamic routing protocol.
In 2012, Polyakov announced a new version of POHMELFS based on Elliptics.
Elliptics clients connect directly to all storage servers which helps to:
- Execute lookup in O(1) network requests (single network request per replica)
- Run write/update commands into multiple replicas in parallel
There are several APIs for data access:
- Asynchronous feature/promise C++ library
- Python binding
- Go binding
- HTTP-proxy named Rift with buckets and ACLs based on TheVoid library (using boost::asio)
- Community-driven Erlang bindings
- Distributed hash tables, no metadata servers, true horizontal scaling
- Data replication – replicas can be stored in different physical locations
- Range and bulk requests
- Different I/O storage backends, API to create own low-level storage backends
- Automatic data repartitioning in case of removed or added nodes
- Eventually consistent recovery
- Consistent hashing addressing model
- Cluster statistics
- HTTP frontend, C/C++, Python, Go bindings
- Server-side script execution support (write trigger analog)
- Distributed SLRU cache with TTL
- P2P streaming support (eblob and file backends only - external applications like Nginx web server can stream data from eblob object files directly to clients without proxying)
Problems and restrictions
- Eventual consistency. As Elliptics is fully distributed in case of emergency server can possibly return a file copy which is older than an actual one. Sometimes it can be unacceptable. In these cases due to time loses it is better to use more reliable ways of data request.
- Network between client and servers can become a weak point as data is written on several servers in parallel.
- API may be inconvenient for high-level requests. Elliptics does not provide its users with SQL-like data requests.
- Elliptics does not have high-level transactions support that is why it is impossible to guarantee that a command group will be fully executed or will not be executed at all.
- Transactions are only atomic within group and are locked based on primary key.
Elliptics and its supporting projects are being documented at community wiki. It contains high-level design docs, tutorial, low-level details and knowledge base. Elliptics and related projects are discussed in open Google group.