Perfect hash function
From Wikipedia, the free encyclopedia
A perfect hash function for a set S is a hash function that maps distinct elements in S to distinct integers, with no collisions. A perfect hash function with values in a limited range can be used for efficient lookup operations, by placing keys from S (or other associated values) in a table indexed by the output of the function.
A perfect hash function for a specific set S that can be evaluated in constant time, and with values in a small range, can be found by a randomized algorithm in a number of operations that is proportional to the size of S. The minimal size of the description of a perfect hash function depends on the range of its function values: The smaller the range, the more space is required. Any perfect hash functions suitable for use with a hash table require at least a number of bits that is proportional to the size of S.
Using a perfect hash function is best in situations where there is a large set, S, which is not updated frequently, and many lookups into it. Efficient solutions to performing updates are known as dynamic perfect hashing, but these methods are relatively complicated to implement. A simple alternative to perfect hashing, which also allows dynamic updates, is cuckoo hashing.
A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers—usually [0..n-1] or [1..n]. A more formal way of expressing this is: Let j and k be elements of some set K. F is a minimal perfect hash function iff F(j) =F(k) implies j=k and there exists an integer a such that the range of F is a..a+|K|-1. It has been proved that any minimal perfect hash scheme requires at least 1.44 bits/key. However the smallest currently use around 2.5 bits/key. Some of these space-efficient perfect hash functions have been implemented in cmph library and Sux4J.
A minimal perfect hash function F is order preserving if keys are given in some order a1, a2, …, and for any keys aj and ak, j<k implies F(aj)<F(ak). Order-preserving minimal perfect hash functions require necessarily Ω(n log n) bits to be represented.
A minimal perfect hash function F is monotone if it preserves the lexicographical order of the keys. Monotone minimal perfect hash functions can be represented in very little space. Several implementations of monotone minimal perfect hash functions are available in Sux4J.
[edit] See also
[edit] References
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 11.5: Perfect hashing, pp.245–249.
- Fabiano C. Botelho, Rasmus Pagh and Nivio Ziviani. "Perfect Hashing for Data Management Applications".
- Fabiano C. Botelho and Nivio Ziviani. "External perfect hashing for very large key sets". 16th ACM Conference on Information and Knowledge Management (CIKM07), Lisbon, Portugal, November 2007.
- Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. "Monotone minimal perfect hashing: Searching a sorted table with O(1) accesses". In Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), New York, 2009. ACM Press.
- Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. "Theory and practise of monotone minimal perfect hashing". In Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 2009.
[edit] External links
- Minimal Perfect Hashing by Bob Jenkins
- gperf is a Free software C and C++ perfect hash generator
- cmph is another Free Software implementing many perfect hashing methods
- Sux4J is another Free Software implementing many perfect hashing methods in Java