2-choice hashing

From Wikipedia, the free encyclopedia
Jump to: navigation, search

2-choice hashing, also known as 2-choice chaining, is a variant of a hash table in which keys are added by hashing with two hash functions. The key is put in the array position with the fewer (colliding) keys. Some collision resolution scheme is needed, unless keys are kept in buckets. The average-case cost of a successful search is O(2 + (m-1)/n), where m is the number of keys and n is the size of the array. The most collisions is \log_2 \ln n + \theta(m/n) with high probability.

How It Works[edit]

2-choice hashing utilizes two hash functions h1(x) and h2(x) which work as hash functions are expected to work (i.e. mapping integers from the universe into a specified range). The two hash functions should be independent and have no correlation to each other. Having two hash functions allows any integer x to have up to two potential locations to be stored based on the values of the respective outputs, h1(x) and h2(x). It is important to note that, although there are two hash functions, there is only one table; both hash functions map to locations on that table.

Implementation[edit]

The most important functions of the hashing implementation in this case are insertion and search.

Insertion: When inserting the values of both hash functions are computed for the to-be-inserted object. The object is then placed in the bucket which contains fewer objects. If the buckets are equal in size, the default location is the h1(x) value.

Search: Effective searches are done by looking in both buckets, that is, the bucket locations which h1(x) and h2(x) mapped to for the desire value.

Performance[edit]

As is true with all hash tables, the performance is based on the largest bucket. Although there are instances where bucket sizes happen to be large based on the values and the hash functions used, this is rare. Having two hash functions and, therefore, two possible locations for any one value, makes the possibility of large buckets even more unlikely to happen.

The expected bucket size while using 2-choice hashing is: θ(log(log(n))). This massive improvement is due to the randomized concept known as The Power of Two Choices.

Note: The idea of multiple hash functions is optimized using 2 hash functions. There is no improvement found if the number of hash functions is increased - that is 3 hash functions does not have better performance than 2.

See also[edit]

Paul E. Black, 2-choice hashing at the NIST Dictionary of Algorithms and Data Structures.