Linear probing is accomplished using two values - one as a starting value and one as an interval between successive values in modular arithmetic. The second value, which is the same for all keys and known as the stepsize, is repeatedly added to the starting value until a free space is found, or the entire table is traversed. (In order to traverse the entire table the stepsize should be relatively prime to the arraysize, which is why the array size is often chosen to be a prime number.)
- newLocation = (startingValue + stepSize) % arraySize
Given an ordinary hash function H(x), a linear probing function (H(x, i)) would be:
Here H(x) is the starting value, n the size of the hash table, and the stepsize is i in this case.
Often, the step size is one; that is, the array cells that are probed are consecutive in the hash table. Double hashing is a variant of the same method in which the step size is itself computed by a hash function.
This algorithm, which is used in open-addressed hash tables, provides good memory caching (if stepsize is equal to one), through good locality of reference, but also results in clustering, an unfortunately high probability that where there has been one collision there will be more. The performance of linear probing is also more sensitive to input distribution when compared to double hashing, where the stepsize is determined by another hash function applied to the value instead of a fixed stepsize as in linear probing.
Dictionary operation in constant time
Using linear probing, dictionary operation can be implemented in constant time. In other words, insert, remove and find operations can be implemented in O(1), as long as the load factor of the hash table is a constant strictly less than one. This analysis makes the (unrealistic) assumption that the hash function is completely random, but can be extended also to 5-independent hash functions. Weaker properties, such as universal hashing, are not strong enough to ensure the constant-time operation of linear probing, but one practical method of hash function generation, tabulation hashing, again leads to a guaranteed constant expected time performance despite not being 5-independent.
- Dale, Nell (2003). C++ Plus Data Structures. Sudbury, MA: Jones and Bartlett Computer Science. ISBN 0-7637-0481-4.
- Knuth, Donald (1963), Notes on "Open" Addressing
- Pagh, Anna; Pagh, Rasmus; Ružić, Milan (2009), Linear probing with constant independence, SIAM Journal on Computing 39 (3): 1107–1120, doi:10.1137/070702278, MR 2538852
- Pătraşcu, Mihai; Thorup, Mikkel (2010), "On the k-independence required by linear probing and minwise independence", Automata, Languages and Programming, 37th International Colloquium, ICALP 2010, Bordeaux, France, July 6-10, 2010, Proceedings, Part I, Lecture Notes in Computer Science 6198, Springer, pp. 715–726, doi:10.1007/978-3-642-14165-2_60
- Pătraşcu, Mihai; Thorup, Mikkel (2011), "The power of simple tabulation hashing", Proceedings of the 43rd annual ACM Symposium on Theory of Computing (STOC '11), pp. 1–10, arXiv:1011.5200, doi:10.1145/1993636.1993638
- How Caching Affects Hashing by Gregory L. Heileman and Wenbin Luo 2005.
- Open Data Structures - Section 5.2 - LinearHashTable: Linear Probing