Jump to content

Eventual consistency

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Gdupont (talk | contribs) at 17:20, 21 August 2011. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Eventual consistency is one of the consistency models used in the domain of parallel programming, for example in distributed shared memory, distributed transactions, and optimistic replication[1][2], it means that given a sufficiently long period of time over which no changes are sent, we can expect that all updates will eventually propagate through the system and all the replicas will be consistent. While some authors use that definition (e.g. Vogels), others prefer a stronger definition that requires good things to happen even in the presence of continuing updates, reconfigurations, or failures. In the Terry et al. work referenced above, eventual consistency means that for a given accepted update and a given replica, eventually, either the update reaches the replica, or the replica retires from service.

In database terminology, this is known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to the database concept of ACID [3].

Conflict resolution

As the consistency achieved is eventual, the conflicts have to be resolved. There are three types of resolution:[4][2]

  • Read repair: The correction is done when a read finds an inconsistency. This slows down the read operation.
  • Write repair: The correction is done during a write operation, if an inconsistency has been found out, slowing down the write operation.
  • Asynchronous repair: The correction is not part of a read or write operation.

Concepts used for the conflict resolution are the client-specified timestamps and vector clocks.[4]

Types

A number of different levels of consistency that can be reached. Apache Cassandra uses three levels:[5]

  • ONE: Data is written after at least one node's commit table and memory table has been modified with the new data, and the node response has reached the client.
  • QUORUM: Data has to be written to < replication factor >/2 + 1 nodes before responding to the client.
  • ALL: All nodes have to read (write) the data.

See also

References

  1. ^ W. Vogels. Eventually Consistent. ACM Queue vol. 6, no. 6, December 2008.
  2. ^ a b D. B. Terry, et. al. Managing update conflicts in Bayou, a weakly connected replicated storage system ACM SOSP, December 1995.
  3. ^ D. Pritchett [1]. ACM Queue vol. 6, no. 3, July 28, 2008.
  4. ^ a b Olivier Mallassi (2010-06-09). "Let's play with Cassandra… (Part 1/3)". http://blog.octo.com/en/: OCTO Talks!. Retrieved 2011-03-23. Of course, at a given time, chances are high that each node has its own version of the data. Conflict resolution is made during the read requests (called read-repair) and the current version of Cassandra does not provide a Vector Clock conflict resolution mechanisms (should be available in the version 0.7). Conflict resolution is so based on timestamp (the one set when you insert the row or the column): the higher timestamp win and the node you are reading the data is responsible for that. This is an important point because the timestamp is specified by the client, at the moment the column is inserted. Thus, all Cassandra clients' need to be synchronized... {{cite web}}: External link in |location= (help); line feed character in |quote= at position 495 (help)
  5. ^ Olivier Mallassi (2010-06-09). "Let's play with Cassandra… (Part 1/3)". http://blog.octo.com/en/: OCTO Talks!. Retrieved 2011-03-23. Cassandra defines different levels of consistency and I will not go into further details but here are a couple of them: - ONE. Cassandra ensures the data is written to at least one node's commit log and memory table before responding to the client. During read, the data will be returned from the first node where it is found. In that case, you must accept stale state because you have no warranty the node you hit to read the data has the last version of the data. - QUORUM. In that case, Cassandra will write the data on < replicationfactor >/2 + 1 nodes before responding to the client (the Replication factor is the number of nodes the data will be replicated and is defined for a Keyspace). For the read, the data will be read on < replicationfactor >/2 + 1nodes before returning the data. In that case, you are sure to get a consistent data (because N is smaller than R+W where N is the total number of nodes where the data is replicated, R the number of nodes where this data is being read and W the number of nodes the data is being written) - ALL. In that case, Cassandra will write and read the data from all the nodes. {{cite web}}: External link in |location= (help); line feed character in |quote= at position 120 (help)