MurmurHash

From Wikipedia, the free encyclopedia
Jump to: navigation, search

MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.[1][2][3] It was created by Austin Appleby in 2008,[4][5] and exists in a number of variants,[6] all of which have been released into the public domain. When compared to other popular hash functions, MurmurHash performed well in a random distribution of regular keys.[7]

Unlike cryptographic hash functions, it is not designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.

Variants[edit]

The current version is MurmurHash3,[8][9] which yields a 32-bit or 128-bit hash value.

The older MurmurHash2[10] yields a 32-bit or 64-bit value. Slower versions of MurmurHash2 are available for big-endian and aligned-only machines. The MurmurHash2A variant adds the Merkle–Damgård construction so that it can be called incrementally. There are two variants which generate 64-bit values; MurmurHash64A, which is optimized for 64-bit processors, and MurmurHash64B, for 32-bit ones. MurmurHash2-160 generates the 160-bit hash, and MurmurHash1 is obsolete.

Implementations[edit]

The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,[11] C,[12] C#,[9][13] Perl,[14] Ruby,[15] PHP,[16] Haskell,[17] Scala,[18] Java,[19][20] Erlang,[21] and JavaScript.[22][23]

It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), Perl,[24] nginx (ver 1.0.1),[25] Rubinius,[26] libmemcached (the C driver for Memcached),[27] maatkit,[28] Hadoop,[1] Kyoto Cabinet,[29] RaptorDB,[30] OlegDB,[31] Cassandra[32] and Clojure [33]

Algorithm[edit]

Murmur3_32(key, len, seed)
    // Note: In this version, all integer arithmetic is performed with unsigned 32 bit integers.
    //       In the case of overflow, the result is constrained by the application of modulo 2^{32} arithmetic.
    
    c1 \gets 0xcc9e2d51
    c2 \gets 0x1b873593
    r1 \gets 15
    r2 \gets 13
    m \gets 5
    n \gets 0xe6546b64
 
    hash \gets seed

    for each fourByteChunk of key
        k \gets fourByteChunk

        k \gets k * c1
        k \gets (k << r1) OR (k >> (32-r1))
        k \gets k * c2

        hash \gets hash XOR k
        hash \gets (hash << r2) OR (hash >> (32-r2))
        hash \gets hash * m + n

    with any remainingBytesInKey
        remainingBytes \gets SwapEndianOrderOf(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.
        
        remainingBytes \gets remainingBytes * c1
        remainingBytes \gets (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
        remainingBytes \gets remainingBytes * c2

        hash \gets hash XOR remainingBytes
 
    hash \gets hash XOR len

    hash \gets hash XOR (hash >> 16)
    hash \gets hash * 0x85ebca6b
    hash \gets hash XOR (hash >> 13)
    hash \gets hash * 0xc2b2ae35
    hash \gets hash XOR (hash >> 16)

A sample C implementation follows:

uint32_t murmur3_32(const char *key, uint32_t len, uint32_t seed) {
	static const uint32_t c1 = 0xcc9e2d51;
	static const uint32_t c2 = 0x1b873593;
	static const uint32_t r1 = 15;
	static const uint32_t r2 = 13;
	static const uint32_t m = 5;
	static const uint32_t n = 0xe6546b64;
 
	uint32_t hash = seed;
 
	const int nblocks = len / 4;
	const uint32_t *blocks = (const uint32_t *) key;
	int i;
	for (i = 0; i < nblocks; i++) {
		uint32_t k = blocks[i];
		k *= c1;
		k = (k << r1) | (k >> (32 - r1));
		k *= c2;
 
		hash ^= k;
		hash = ((hash << r2) | (hash >> (32 - r2))) * m + n;
	}
 
	const uint8_t *tail = (const uint8_t *) (key + nblocks * 4);
	uint32_t k1 = 0;
 
	switch (len & 3) {
	case 3:
		k1 ^= tail[2] << 16;
	case 2:
		k1 ^= tail[1] << 8;
	case 1:
		k1 ^= tail[0];
 
		k1 *= c1;
		k1 = (k1 << r1) | (k1 >> (32 - r1));
		k1 *= c2;
		hash ^= k1;
	}
 
	hash ^= len;
	hash ^= (hash >> 16);
	hash *= 0x85ebca6b;
	hash ^= (hash >> 13);
	hash *= 0xc2b2ae35;
	hash ^= (hash >> 16);
 
	return hash;
}

See also[edit]

References[edit]

  1. ^ a b "Hadoop in Java". Hbase.apache.org. 24 July 2011. Retrieved 13 January 2012. 
  2. ^ Chouza et al.
  3. ^ "Couceiro et al." (PDF) (in Portuguese). Retrieved 13 January 2012. 
  4. ^ "MurmurHash on GooglePages". Murmurhash.googlepages.com. Retrieved 13 January 2012. 
  5. ^ Tanjent (tanjent) wrote,3 March 2008 13:31:00. "MurmurHash first announcement". Tanjent.livejournal.com. Retrieved 13 January 2012. 
  6. ^ "MurmurHash2-160". Simonhf.wordpress.com. 25 September 2010. Retrieved 13 January 2012. 
  7. ^ "Which hashing algorithm is best for uniqueness and speed". stackexchange.com. 
  8. ^ "MurmurHash3 on smhasher". 
  9. ^ a b Horvath, Adam (Aug 10, 2012). "MurMurHash3, an ultra fast hash algorithm for C# / .NET". 
  10. ^ "MurmurHash2 on smhasher". 
  11. ^ "pyfasthash in Python". Google. Retrieved 13 January 2012. 
  12. ^ "C implementation in qLibc by Seungyoung Kim". 
  13. ^ Landman, Davy. "Davy Landman in C#". Landman-code.blogspot.com. Retrieved 13 January 2012. 
  14. ^ "Toru Maesaka in Perl". metacpan.org. Retrieved 13 January 2012. 
  15. ^ Bruce Williams <http://codefluency.com>, for Ruby Central <http://rubycentral.org> (3 May 2009). "Ruby". Rubyforge.org. Retrieved 13 January 2012. 
  16. ^ "Murmurhash3 PHP extension". Murmur.vaizard.org. Retrieved 13 January 2012. 
  17. ^ "Haskell". Hackage.haskell.org. Retrieved 13 January 2012. 
  18. ^ "Scala standard library implementation". 14 December 2012. 
  19. ^ MurmurHash3 in Java, part of Guava
  20. ^ Derek Young in Java, public domain
  21. ^ MurmurHash3 in Erlang
  22. ^ raycmorgan (owner). "Javascript implementation by Ray Morgan". Gist.github.com. Retrieved 13 January 2012. 
  23. ^ garycourt. "Murmur3.java in Clojure Source on Github". Github.com. Retrieved 13 January 2012. 
  24. ^ "perl5176delta". Retrieved 31 December 2012. 
  25. ^ "nginx". Retrieved 13 January 2012. 
  26. ^ "Rubinius". Retrieved 29 February 2012. 
  27. ^ libmemcached
  28. ^ "maatkit". Google. 24 March 2009. Retrieved 13 January 2012. 
  29. ^ "Kyoto Cabinet specification". Fallabs.com. 4 March 2011. Retrieved 13 January 2012. 
  30. ^ Gholam, Mehdi (13 November 2011). "RaptorDB CodeProject page". Codeproject.com. Retrieved 13 January 2012. 
  31. ^ "OlegDB Documentation". Retrieved 24 January 2013. 
  32. ^ "Partitioners". apache.org. 2013-11-15. Retrieved 2013-12-19. 
  33. ^ "Murmur3.java in Clojure source code on Github". clojure.org. Retrieved 2014-03-11.