Count–min sketch

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Count–min sketch (or CM sketch) is a probabilistic sub-linear space streaming algorithm which can be used to summarize a data stream in many different ways. The algorithm was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan.[1]

Count–min sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets and frequency tables. Spectral Bloom filters with multi-set policy, are conceptually isomorphic to the Count-Min Sketch.[2]



The data structure is parameterized by the constants w and d which determine the time and space needs and the probability of error of the queries. The algorithm needs a two dimensional array, called here count, with w columns and d rows. A series of d hash functions must be randomly drawn from a pairwise independent hash function family, each associated with a row in the array.

For later convenience we assign w = \lceil e/\epsilon \rceil and  d = \lceil \ln{1/\delta} \rceil, where the error in answering a query is within a factor of \epsilon with probability \delta.


When a new value a arrives we update as follows: \forall j : 1 \leq j \leq d,  count[j,h_j(a)] \leftarrow count[j,h_j(a)] + 1 . That is, for each row we take the corresponding hash function, apply it to the newly received value and add one to the column corresponding to the hash value.


The array can then be used to estimate any of several different statistics at any point. If we want to estimate, for instance, the number of times  a_i for a specific value i appeared so far in the stream we would compute \hat a_i=\min_j count[j,h_j(i)] (this assumes all added values are positive). This estimate has the guarantee that \hat a_i \leq a_i + \epsilon |a| with probability 1-\delta.

Small modifications to the data structure can be used to sketch other different stream statistics.

External links[edit]

See also[edit]


  1. ^ Cormode, Graham; S. Muthukrishnan (2004). "An Improved Data Stream Summary: The Count-Min Sketch and its Applications". J. Algorithms 55: 29–38. 
  2. ^ Vallentin, Matthias (11 July 2014). "A Garden Variety of Bloom Filters".