# Linear hashing

Linear hashing is a dynamic hash table algorithm invented by Witold Litwin (1980),[1] and later popularized by Paul Larson. Linear hashing allows for the expansion of the hash table one slot at a time. The frequent single slot expansion can very effectively control the length of the collision chain. The cost of hash table expansion is spread out across each hash table insertion operation, as opposed to being incurred all at once.[2] Linear hashing is therefore well suited for interactive applications.

## Algorithm Details

First the initial hash table is set up with some arbitrary initial number of buckets. The following values need to be kept track of:

• $N$: The initial number of buckets.
• $L$: The current level which is an integer that indicates on a logarithmic scale approximately how much the table has grown up in number. This is initially $0$.
• $S$: The step pointer which points to a bucket. It initially points to the first bucket in the table.

Bucket collisions can be handled in a variety of ways but it is typical to have space for two items in each bucket and to add more buckets whenever a bucket overflows. Addresses are calculated in the following way:

• Apply a hash function to the key and call the result $H$.
• If $H \bmod N \times 2^L$ is an address that comes before $S$, the address is $H \bmod N \times 2^{L+1}$.
• If $H \bmod N \times 2^L$ is $S$ or an address that comes after $S$, the address is $H \bmod N \times 2^L$.

To add a bucket:

• Allocate a new bucket at the end of the table.
• If $S$ points to the $N \times 2^L$th bucket in the table, reset $S$ and increment $L$.
• Otherwise increment $S$.

The effect of all of this is that the table is split into three sections; the section before $S$, the section from $S$ to $N \times 2^L$, and the section after $N \times 2^L$. The first and last sections are stored using $H \bmod N \times 2^{L+1}$ and the middle section is stored using $H \bmod N \times 2^L$. Each time $S$ reaches $N \times 2^L$ the table has doubled in size.

## Adoption in language systems

Griswold and Townsend [3] discussed the adoption of linear hashing in the Icon language. They discussed the implementation alternatives of dynamic array algorithm used in linear hashing, and presented performance comparisons using a list of Icon benchmark applications.

## Adoption in database systems

Linear hashing is used in the BDB Berkeley database system, which in turn is used by many software systems such as OpenLDAP, using a C implementation derived from the CACM article and first published on the Usenet in 1988 by Esmond Pitt.

## References

1. ^ Litwin, Witold (1980), "Linear hashing: A new tool for file and table addressing" (PDF), Proc. 6th Conference on Very Large Databases: 212–223
2. ^ Larson, Per-Åke (April 1988), "Dynamic Hash Tables", Communications of the ACM 31 (4): 446–457, doi:10.1145/42404.42410
3. ^ Griswold, William G.; Townsend, Gregg M. (April 1993), "The Design and Implementation of Dynamic Hashing for Sets and Tables in Icon", Software - Practice and Experience 23 (4): 351–367