Cuckoo hashing
From Wikipedia, the free encyclopedia
Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001.
Contents |
[edit] Theory
The basic idea is to use two hash functions instead of only one. This provides two possible locations in the hash table for each key.
When a new key is inserted, a greedy algorithm is used: The new key is inserted in one of its two possible locations, "kicking out", that is, displacing any key that might already reside in this location. This displaced key is then inserted in its alternative location, again kicking out any key that might reside there, until a vacant position is found, or the procedure enters an infinite loop. In the latter case, the hash table is rebuilt using new hash functions.
Lookup requires inspection of just two locations in the hash table, which takes constant time in the worst case (see Big O notation). This is in contrast to many other hash table algorithms, which may not have a constant worst-case bound on the time to do a lookup.
It can also be shown that insertions succeed in expected constant time , even considering the possibility of having to rebuild the table, as long as the number of keys is kept below half of the capacity of the hash table.
The basic version of cuckoo hashing has load factor limited to 49%.
Generalizations of cuckoo hashing that use more than 2 alternative hash functions can be expected to utilize a larger part of the capacity of the hash table efficiently while sacrificing some lookup and insertion speed. Using just three hash functions increases the load to 91%. Another generalization of cuckoo hashing consists in using more than one key per bucket. Using just 2 keys per bucket permits a load factor above 80%.
(Other algorithms that use multiple hash functions include the Bloom filter). Cuckoo hashing can be used to implement a data structure equivalent to a Bloom filter.
The name derives from the behavior of some species of cuckoo, where the cuckoo chick pushes the other eggs or young out of the nest when it hatches. A study by Zukowski et al. has showed that cuckoo hashing is much faster than chained hashing for small, cache-resident hash tables on modern processors. K. Ross has shown bucketized versions of cuckoo hashing (variants that use buckets that contain more than one key) to be faster than conventional methods also for large hash tables, when space utilization is high. However as of 2007 cuckoo hashing remains largely unknown outside the research community.
[edit] See also
[edit] References
- ^ Cuckoo Hashing, Pagh, Rodler, 2001.
- ^ Architecture-Conscious Hashing, Zukowski et al, 2006.
- ^ Efficient Hash Probes on Modern Processors, Kenneth A. Ross, 2006.
- A cool and practical alternative to traditional hash tables, U. Erlingsson, M. Manasse, F. Mcsherry, 2006.
- Cuckoo Hashing for Undergraduates, 2006, R. Pagh, 2006.
- Cuckoo Hashing, Theory and Practice (Part 1, Part 2 and Part 3), Michael Mitzenmacher, 2007.
- Naor, Moni; Segev, Gil; Wieder, Udi (2008). "History-Independent Cuckoo Hashing". International Colloquium on Automata, Languages and Programming (ICALP). Retrieved on 2008-07-21.

