# Cuckoo filter

A cuckoo filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, like a Bloom filter does. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". A cuckoo filter can also delete existing items, which is not supported by Bloom filters. In addition, for applications that store many items and target moderately low false positive rates, cuckoo filters can achieve lower space overhead than space-optimized Bloom filters.

Cuckoo filters were first described in 2014.

## Algorithm description

A cuckoo filter uses a 4-way set-associative hash table based on cuckoo hashing to store the fingerprints of all items. Particularly, the two potential buckets in the table for a given item $x$ required by cuckoo hashing are calculated by the following two hash functions (termed as partial-key cuckoo hashing):

$h_{1}\left(x\right)={\text{hash}}\left(x\right)$ $h_{2}\left(x\right)=h_{1}\left(x\right)\oplus {\text{hash}}\left({\text{fingerprint}}\left(x\right)\right)$ Applying the above two hash functions to construct a cuckoo hash table enables item relocation based only on fingerprints when retrieving the original item is impossible. As a result, when inserting a new item that requires relocating an existing item $y$ , the other possible location $j$ in the table for this item $y$ kicked out from bucket $i$ is calculated by

$j=i\oplus {\text{hash}}\left({\text{y's fingerprint}}\right)$ Based on partial-key cuckoo hashing, the hash table can achieve both highly-utilization (thanks to cuckoo hashing), and compactness because only fingerprints are stored. Lookup and delete operations of a cuckoo filter are straightforward. There are a maximum of two locations to check by $h_{1}(x)$ and $h_{2}(x)$ . If found, the appropriate lookup or delete operation can be performed in $O(1)$ time. More theoretical analysis of cuckoo filters can be found in the literature.

## Comparison to Bloom filters

A cuckoo filter is similar to a Bloom filter in that they both are very fast and compact, and they may both return false positives as answers to set-membership queries:

• Space-optimal Bloom filters use $1.44\log _{2}(1/\epsilon )$ bits of space per inserted key, where $\epsilon$ is the false positive rate. A cuckoo filter requires $(\log _{2}(1/\epsilon )+2)/\alpha$ where $\alpha$ is the hash table load factor, which can be $95.5\%$ based on the cuckoo filter's setting. Note that the information theoretical lower bound requires $\log _{2}(1/\epsilon )$ bits for each item.
• On a positive lookup, a space-optimal Bloom filter requires a constant $\log _{2}(1/\epsilon )$ memory accesses into the bit array, whereas a cuckoo filter requires constant two lookups at most.
• Cuckoo filters have degraded insertion speed after reaching a load threshold, when table expanding is recommended. In contrast, Bloom filters can keep inserting new items at the cost of a higher false positive rate before expansion.

## Limitations

• A cuckoo filter can only delete items that are known to be inserted before.
• Insertion can fail and rehashing is required like other cuckoo hash tables. Note that the amortized insertion complexity is still $O(1)$ .