MurmurHash

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.[1] It was created by Austin Appleby in 2008[2] and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants,[3] all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.

Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.

Variants[edit]

MurmurHash3[edit]

The current version is MurmurHash3,[4][5] which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher--a hash function test suite.

MurmurHash2[edit]

MurmurHash2[6] yields a 32-bit or 64-bit value. It came in multiple variants, including some that allowed incremental hashing and aligned or neutral versions.

  • MurmurHash2 (32-bit, x86) - The original version; contains a flaw that weakens collision in some cases[7].
  • MurmurHash2A (32-bit, x86) - A fixed variant using Merkle–Damgård construction. Slightly slower.
  • CMurmurHash2A (32-bit, x86) - MurmurHash2A but works incrementally.
  • MurmurHashNeutral2 (32-bit, x86) - Slower, but endian and alignment neutral.
  • MurmurHashAligned2 (32-bit, x86) - Slower, but does aligned reads (safer on some platforms).
  • MurmurHash64A (64-bit, x64) - The original 64-bit version. Optimized for 64-bit arithmetic.
  • MurmurHash64B (64-bit, x86) - A 64-bit version optimized for 32-bit platforms. Unfortunately it is not a true 64-bit hash due to insufficient mixing of the stripes[8].

The person who originally found the flaw in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160[9].

MurmurHash1[edit]

The original MurmurHash was created as an attempt to make a faster function than Lookup3[10]. Although successful, it hadn't been tested thoroughly and wasn't capable of providing 64-bit hashes as in Lookup3. It had a rather elegant design, that would be later built upon in MurmurHash2, combining a multiplicative hash (similar to Fowler–Noll–Vo hash function) with a Xorshift.

Implementations[edit]

The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,[11] C,[12] Go,[13] C#,[5][14] D,[15],Lua Perl,[16] Ruby,[17] Rust,[18] PHP,[19] Common Lisp,[20] Haskell,[21] Elm,[22] Clojure,[23] Scala,[24] Java,[25][26] Erlang,[27] Swift,[28] and JavaScript,[29][30] together with an online version.[31]

It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), nginx (ver 1.0.1),[32] Rubinius,[33] libmemcached (the C driver for Memcached),[34] npm (nodejs package manager),[35] maatkit,[36] Hadoop,[1] Kyoto Cabinet,[37] RaptorDB,[38] OlegDB,[39] Cassandra,[40] Solr,[41] vowpal wabbit,[42]Elasticsearch,[43], Guava[44], Kafka[45] and RedHat Virtual Data Optimizer (VDO)[[46]]

Vulnerabilities[edit]

Hash functions can be vulnerable to attack if a user can choose input data in such as way to intentionally cause hash collisions. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called HashDoS attacks.[47] With the use of differential cryptanalysis they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend to use their own SipHash instead.

Algorithm[edit]

algorithm Murmur3_32 is
    // Note: In this version, all arithmetic is performed with unsigned 32-bit integers.
    //       In the case of overflow, the result is reduced modulo 232.
    input: key, len, seed
    
    c1 ← 0xcc9e2d51
    c2 ← 0x1b873593
    r1 ← 15
    r2 ← 13
    m ← 5
    n ← 0xe6546b64
 
    hash ← seed

    for each fourByteChunk of key do
        k ← fourByteChunk

        k ← k × c1
        k ← k ROL r1
        k ← k × c2

        hash ← hash XOR k
        hash ← hash ROL r2
        hash ← (hash × m) + n

    with any remainingBytesInKey do
        remainingBytes ← SwapToLittleEndian(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.
        
        remainingBytes ← remainingBytes × c1
        remainingBytes ← remainingBytes ROL r1
        remainingBytes ← remainingBytes × c2

        hash ← hash XOR remainingBytes
 
    hash ← hash XOR len

    hash ← hash XOR (hash >> 16)
    hash ← hash × 0x85ebca6b
    hash ← hash XOR (hash >> 13)
    hash ← hash × 0xc2b2ae35
    hash ← hash XOR (hash >> 16)

A sample C implementation follows (for little-endian CPUs)

static inline uint32_t murmur_32_scramble(uint32_t k) {
    k *= 0xcc9e2d51;
    k = (k << 15) | (k >> 17);
    k *= 0x1b873593;
    return k;
}
uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed)
{
	uint32_t h = seed;
    uint32_t k;
    /* Read in groups of 4. */
    for (size_t i = len >> 2; i; i--) {
        // Here is a source of differing results across endiannesses.
        // A swap here has no effects on hash properties though.
        k = *((uint32_t*)key);
        key += sizeof(uint32_t);
        h ^= murmur_32_scramble(k);
        h = (h << 13) | (h >> 19);
        h = h * 5 + 0xe6546b64;
    }
    /* Read the rest. */
    k = 0;
    for (size_t i = len & 3; i; i--) {
        k <<= 8;
        k |= key[i - 1];
    }
    // A swap is *not* necessary here because the preceeding loop already
    // places the low bytes in the low places according to whatever endianess
    // we use. Swaps only apply when the memory is copied in a chunk.
    h ^= murmur_32_scramble(k);
    /* Finalize. */
	h ^= len;
	h ^= h >> 16;
	h *= 0x85ebca6b;
	h ^= h >> 13;
	h *= 0xc2b2ae35;
	h ^= h >> 16;
	return h;
}

See also[edit]

References[edit]

  1. ^ a b "Hadoop in Java". Hbase.apache.org. 24 July 2011. Archived from the original on 12 January 2012. Retrieved 13 January 2012.
  2. ^ Tanjent (tanjent) wrote,3 March 2008 13:31:00. "MurmurHash first announcement". Tanjent.livejournal.com. Retrieved 13 January 2012.
  3. ^ "MurmurHash2-160". Simonhf.wordpress.com. 25 September 2010. Retrieved 13 January 2012.
  4. ^ "MurmurHash3 on Github".
  5. ^ a b Horvath, Adam (10 August 2012). "MurMurHash3, an ultra fast hash algorithm for C# / .NET".
  6. ^ "MurmurHash2 on Github".
  7. ^ "MurmurHash2Flaw". Retrieved 15 January 2019.
  8. ^ "MurmurHash3 (see note on MurmurHash2_x86_64)". Retrieved 15 January 2019.
  9. ^ "MurmurHash2_160". Retrieved 12 January 2019.
  10. ^ "MurmurHash1". Retrieved 12 January 2019.
  11. ^ "pyfasthash in Python". Google. Retrieved 13 January 2012.
  12. ^ "C implementation in qLibc by Seungyoung Kim".
  13. ^ "murmur3 in Go".
  14. ^ Landman, Davy. "Davy Landman in C#". Landman-code.blogspot.com. Retrieved 13 January 2012.
  15. ^ "std.digest.murmurhash - D Programming Language". dlang.org. Retrieved 5 November 2016.
  16. ^ "Toru Maesaka in Perl". metacpan.org. Retrieved 13 January 2012.
  17. ^ Yuki Kurihara (16 October 2014). "Digest::MurmurHash". GitHub.com. Retrieved 18 March 2015.
  18. ^ "stusmall/murmur3". GitHub. Retrieved 29 November 2015.
  19. ^ "PHP userland implementation of MurmurHash3". github.com. Retrieved 18 December 2017.
  20. ^ "tarballs_are_good / murmurhash3". Retrieved 7 February 2015.
  21. ^ "Haskell". Hackage.haskell.org. Retrieved 13 January 2012.
  22. ^ "Elm". package.elm-lang.org. Retrieved 12 June 2019.
  23. ^ "Murmur3.java in Clojure source code on Github". clojure.org. Retrieved 11 March 2014.
  24. ^ "Scala standard library implementation". 26 September 2014.
  25. ^ Murmur3, part of Guava
  26. ^ "Murmur3A and Murmur3F Java classes on Github". greenrobot. Retrieved 5 November 2014.
  27. ^ "bipthelin/murmerl3". GitHub. Retrieved 21 October 2015.
  28. ^ Daisuke T (7 February 2019). "MurmurHash-Swift". GitHub.com. Retrieved 10 February 2019.
  29. ^ raycmorgan (owner). "Javascript implementation by Ray Morgan". Gist.github.com. Retrieved 13 January 2012.
  30. ^ garycourt. "MurmurHash.js on Github". Github.com. Retrieved 13 January 2012.
  31. ^ "Online version of MurmurHash". shorelabs.com. Retrieved 12 August 2014.
  32. ^ "nginx". Retrieved 13 January 2012.
  33. ^ "Rubinius". Retrieved 29 February 2012.
  34. ^ "libMemcached". libmemcached.org. Retrieved 21 October 2015.
  35. ^ "switch from MD5 to murmur".
  36. ^ "maatkit". Google. 24 March 2009. Retrieved 13 January 2012.
  37. ^ "Kyoto Cabinet specification". Fallabs.com. 4 March 2011. Retrieved 13 January 2012.
  38. ^ Gholam, Mehdi (13 November 2011). "RaptorDB CodeProject page". Codeproject.com. Retrieved 13 January 2012.
  39. ^ "OlegDB Documentation". Retrieved 24 January 2013.
  40. ^ "Partitioners". apache.org. 15 November 2013. Retrieved 19 December 2013.
  41. ^ "Solr MurmurHash2 Javadoc".
  42. ^ "hash.cc in vowpalwabbit source code".
  43. ^ "Elasticsearch 2.0 - CRUD and routing changes".
  44. ^ "Guava Hashing.java".
  45. ^ "Kafka DefaultPartitioner.java".
  46. ^ Virtual Data Optimizer source code
  47. ^ "Breaking Murmur: Hash-flooding DoS Reloaded".

External links[edit]