Jump to content

MurmurHash

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Hannob (talk | contribs) at 09:43, 12 September 2016 (mention attack by bernstein, boßlet and aumasson and change text accordingly). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

MurmurHash is a non-cryptographic hash function that was used for hash-based lookups.[1][2][3] Due to possible denial of service attacks it should no longer be used for this purpose. It was created by Austin Appleby in 2008[4] and is currently hosted on Github along with its test suite named 'SMHasher'. It also exists in a number of variants,[5] all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.

Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.

Variants

The current version is MurmurHash3,[6][7] which yields a 32-bit or 128-bit hash value.

The older MurmurHash2[8] yields a 32-bit or 64-bit value. Slower versions of MurmurHash2 are available for big-endian and aligned-only machines. The MurmurHash2A variant adds the Merkle–Damgård construction so that it can be called incrementally. There are two variants which generate 64-bit values; MurmurHash64A, which is optimized for 64-bit processors, and MurmurHash64B, for 32-bit ones. MurmurHash2-160 generates the 160-bit hash, and MurmurHash1 is obsolete.

Implementations

The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,[9] C,[10] Go,[11] C#,[7][12] Perl,[13] Ruby,[14] Rust,[15] PHP,[16] Common Lisp,[17] Haskell,[18] Clojure,[19] Scala,[20] Java,[21][22] Erlang,[23] and JavaScript,[24][25] together with an online version.[26]

It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), nginx (ver 1.0.1),[27] Rubinius,[28] libmemcached (the C driver for Memcached),[29] npm (nodejs package manager),[30] maatkit,[31] Hadoop,[1] Kyoto Cabinet,[32] RaptorDB,[33] OlegDB,[34] Cassandra,[35] Solr,[36] vowpal wabbit,[37]Elasticsearch,[38] and Guava.[39]

Attacks

MurmurHash was a recommended hash function for hash table implementations. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even randomized implementations of MurmurHash are vulnerable to so-called HashDoS attacks [40]. With the use of differential cryptanalysis they were able to generate inputs that would lead to a hash collission. This can be abused to cause very slow operations of a hash table implementation. The authors of the attack recommend to use SipHash instead.

Algorithm

Murmur3_32(key, len, seed)
    // Note: In this version, all integer arithmetic is performed with unsigned 32 bit integers.
    //       In the case of overflow, the result is constrained by the application of modulo  arithmetic.
    
    c1 ← 0xcc9e2d51
    c2 ← 0x1b873593
    r1 ← 15
    r2 ← 13
    m ← 5
    n ← 0xe6546b64
 
    hash ← seed

    for each fourByteChunk of key
        k ← fourByteChunk

        k ← k × c1
        k ← (k ROL r1)
        k ← k × c2

        hash ← hash XOR k
        hash ← (hash ROL r2)
        hash ← hash × m + n

    with any remainingBytesInKey
        remainingBytes ← SwapToLittleEndian(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.
        
        remainingBytes ← remainingBytes × c1
        remainingBytes ← (remainingBytes ROL r1)
        remainingBytes ← remainingBytes × c2

        hash ← hash XOR remainingBytes
 
    hash ← hash XOR len

    hash ← hash XOR (hash >> 16)
    hash ← hash × 0x85ebca6b
    hash ← hash XOR (hash >> 13)
    hash ← hash × 0xc2b2ae35
    hash ← hash XOR (hash >> 16)
A sample C implementation follows
#define ROT32(x, y) ((x << y) | (x >> (32 - y))) // avoid effort
uint32_t murmur3_32(const char *key, uint32_t len, uint32_t seed) {
	static const uint32_t c1 = 0xcc9e2d51;
	static const uint32_t c2 = 0x1b873593;
	static const uint32_t r1 = 15;
	static const uint32_t r2 = 13;
	static const uint32_t m = 5;
	static const uint32_t n = 0xe6546b64;

	uint32_t hash = seed;

	const int nblocks = len / 4;
	const uint32_t *blocks = (const uint32_t *) key;
	int i;
	uint32_t k;
	for (i = 0; i < nblocks; i++) {
		k = blocks[i];
		k *= c1;
		k = ROT32(k, r1);
		k *= c2;

		hash ^= k;
		hash = ROT32(hash, r2) * m + n;
	}

	const uint8_t *tail = (const uint8_t *) (key + nblocks * 4);
	uint32_t k1 = 0;

	switch (len & 3) {
	case 3:
		k1 ^= tail[2] << 16;
	case 2:
		k1 ^= tail[1] << 8;
	case 1:
		k1 ^= tail[0];

		k1 *= c1;
		k1 = ROT32(k1, r1);
		k1 *= c2;
		hash ^= k1;
	}

	hash ^= len;
	hash ^= (hash >> 16);
	hash *= 0x85ebca6b;
	hash ^= (hash >> 13);
	hash *= 0xc2b2ae35;
	hash ^= (hash >> 16);

	return hash;
}

See also

References

  1. ^ a b "Hadoop in Java". Hbase.apache.org. 24 July 2011. Retrieved 13 January 2012.
  2. ^ Chouza et al.
  3. ^ "Couceiro et al" (PDF) (in Portuguese). Retrieved 13 January 2012.
  4. ^ Tanjent (tanjent) wrote,3 March 2008 13:31:00. "MurmurHash first announcement". Tanjent.livejournal.com. Retrieved 13 January 2012.{{cite web}}: CS1 maint: numeric names: authors list (link)
  5. ^ "MurmurHash2-160". Simonhf.wordpress.com. 25 September 2010. Retrieved 13 January 2012.
  6. ^ "MurmurHash3 on Github".
  7. ^ a b Horvath, Adam (10 August 2012). "MurMurHash3, an ultra fast hash algorithm for C# / .NET".
  8. ^ "MurmurHash2 on Github".
  9. ^ "pyfasthash in Python". Google. Retrieved 13 January 2012.
  10. ^ "C implementation in qLibc by Seungyoung Kim".
  11. ^ "murmur3 in Go".
  12. ^ Landman, Davy. "Davy Landman in C#". Landman-code.blogspot.com. Retrieved 13 January 2012.
  13. ^ "Toru Maesaka in Perl". metacpan.org. Retrieved 13 January 2012.
  14. ^ Yuki Kurihara (16 October 2014). "Digest::MurmurHash". GitHub.com. Retrieved 18 March 2015.
  15. ^ "stusmall/murmur3". GitHub. Retrieved 29 November 2015.
  16. ^ "Murmurhash3 PHP extension". Murmur.vaizard.org. Retrieved 13 January 2012.
  17. ^ "tarballs_are_good / murmurhash3". Retrieved 7 February 2015.
  18. ^ "Haskell". Hackage.haskell.org. Retrieved 13 January 2012.
  19. ^ "Murmur3.java in Clojure source code on Github". clojure.org. Retrieved 11 March 2014.
  20. ^ "Scala standard library implementation". 26 September 2014.
  21. ^ MurmurHash3 in Java, part of Guava
  22. ^ "Murmur3A and Murmur3F Java classes on Github". greenrobot. Retrieved 5 November 2014.
  23. ^ "bipthelin/murmerl3". GitHub. Retrieved 21 October 2015.
  24. ^ raycmorgan (owner). "Javascript implementation by Ray Morgan". Gist.github.com. Retrieved 13 January 2012.
  25. ^ garycourt. "MurmurHash.js on Github". Github.com. Retrieved 13 January 2012.
  26. ^ "Online version of MurmurHash". shorelabs.com. Retrieved 12 August 2014.
  27. ^ "nginx". Retrieved 13 January 2012.
  28. ^ "Rubinius". Retrieved 29 February 2012.
  29. ^ "switch from MD5 to murmur".
  30. ^ "libMemcached". libmemcached.org. Retrieved 21 October 2015.
  31. ^ "maatkit". Google. 24 March 2009. Retrieved 13 January 2012.
  32. ^ "Kyoto Cabinet specification". Fallabs.com. 4 March 2011. Retrieved 13 January 2012.
  33. ^ Gholam, Mehdi (13 November 2011). "RaptorDB CodeProject page". Codeproject.com. Retrieved 13 January 2012.
  34. ^ "OlegDB Documentation". Retrieved 24 January 2013.
  35. ^ "Partitioners". apache.org. 15 November 2013. Retrieved 19 December 2013.
  36. ^ "Solr MurmurHash2 Javadoc".
  37. ^ "hash.cc in vowpalwabbit source code".
  38. ^ "Elasticsearch 2.0 - CRUD and routing changes".
  39. ^ "Guava Hashing.java".
  40. ^ "Breaking Murmur: Hash-flooding DoS Reloaded".