# Linear hashing

(Redirected from Linear hash)

Linear hashing is a dynamic hash table algorithm invented by Witold Litwin (1980),[1] and later popularized by Paul Larson. Linear hashing allows for the expansion of the hash table one slot at a time. The frequent single slot expansion can very effectively control the length of the collision chain. The cost of hash table expansion is spread out across each hash table insertion operation, as opposed to being incurred all at once.[2] Linear hashing is therefore well suited for interactive applications.

## Algorithm Details

First the initial hash table is set up with some arbitrary initial number of buckets. The following values need to be kept track of:

• ${\displaystyle N}$: The initial number of buckets.
• ${\displaystyle L}$: The current level which is an integer that indicates on a logarithmic scale approximately how many buckets the table has grown by. This is initially ${\displaystyle 0}$.
• ${\displaystyle S}$: The step pointer which points to a bucket. It initially points to the first bucket in the table.

Bucket collisions can be handled in a variety of ways but it is typical to have space for two items in each bucket and to add more buckets whenever a bucket overflows. More than two items can be used once the implementation is debugged. Addresses are calculated in the following way:

• Apply a hash function to the key and call the result ${\displaystyle H}$.
• If ${\displaystyle H{\bmod {(}}N\times 2^{L})}$ is an address that comes before ${\displaystyle S}$, the address is ${\displaystyle H{\bmod {(}}N\times 2^{L+1})}$.
• If ${\displaystyle H{\bmod {(}}N\times 2^{L})}$ is ${\displaystyle S}$ or an address that comes after ${\displaystyle S}$, the address is ${\displaystyle H{\bmod {(}}N\times 2^{L})}$.

• Allocate a new bucket at the end of the table.
• If ${\displaystyle S}$ points to the ${\displaystyle N\times 2^{L}}$th bucket in the table, reset ${\displaystyle S}$ and increment ${\displaystyle L}$.
• Otherwise increment ${\displaystyle S}$.

The effect of all of this is that the table is split into three sections; the section before ${\displaystyle S}$, the section from ${\displaystyle S}$ to ${\displaystyle N\times 2^{L}}$, and the section after ${\displaystyle N\times 2^{L}}$. The first and last sections are stored using ${\displaystyle H{\bmod {(}}N\times 2^{L+1})}$ and the middle section is stored using ${\displaystyle H{\bmod {(}}N\times 2^{L})}$. Each time ${\displaystyle S}$ reaches ${\displaystyle N\times 2^{L}}$ the table has doubled in size.

### Points to ponder over

• Full buckets are not necessarily split, and an overflow space for temporary overflow buckets is required. In external storage, this could mean a second file.
• Buckets split are not necessarily full.
• Every bucket will be split sooner or later and so all Overflows will be reclaimed and rehashed.
• Split pointer s decides which bucket to split.
• s is independent to overflowing bucket.
• At level i, s is between 0 and 2i.
• s is incremented and if at end, is reset to 0.
• Since a bucket at s is split then s is in incremented, only buckets before s have the second doubled hash space.
• A large good pseudo random number is first obtained, and then is bit-masked with either 2i -1 or 2i+1 -1, but the latter only applies if x, the random number, bit-masked with the former, 2i - 1, is less than S, so the larger range of hash values only apply to buckets that have already been split.
• e.g. To bit-mask a number, use x & 0111, where & is the AND operator, 111 is binary 7, where 7 = 8 - 1 and 8 is 23 and i = 3.
• What if s lands on a bucket which has 1 or more full overflow buckets? The split will only reduce the overflow bucket count by 1, and the remaining overflow buckets will have to be recreated by seeing which of the new 2 buckets, or their overflow buckets, the overflow entries belong.
• hi (k)= h(k) mod(2i n).
• hi+1 doubles the range of hi.

### Algorithm for inserting ‘k’ and checking overflow condition

• b = h0(k)
• if b < split-pointer then
• b = h1(k)

### Searching in the hash table for ‘k’

• b = h0(k)
• if b < split-pointer then
• b = h1(k)
• read bucket b and search there