From Wikipedia, the free encyclopedia
Jump to: navigation, search

MMH-Badger MAC[edit]

This page is under construction

To guarantee the integrity of a message, one can use either public key digital signatures or use a Message Authentication Code (MAC). MAC is one of the alternative authentication techniques involving the use of a secret key to generate a small fixed-size block of data. The basic setting of MAC is as follows. Two parties A and B want to communicate by sending a message  m . They share a secret key  k . When A sends a message to B then A calculates the MAC as a function of the message and the key  MAC = C_k (m). The message and the key are sent to B. Then B calculates the received message, using the same secret key k to generate a new MAC. The received MAC is compared to the new MAC. When it is matched and only the receiver and the sender know the identity of the secret key, then the message is authentic.


Carter and Wegman[1] introduced universal hashing to construct a Message Authentication Code (MAC). The universal hashing is used to build secure message authentication schemes where opponent’s ability to break messages is bounded by the collision (computer science) probability of the hash family. Actually, there are some works such as UMAC, CRC32, BOB, Poly1305-AES, and IPSX that deal with implementation of universal hashing as a tool for achieving fast and secure message authentication, but this page will discuss more about MMH[2] and Badger[3].

Universal hash function families[2][3][edit]

Universal hashing was first introduced by Carter and Wegman in 1979 and were studied further by Sarwate, Wegman and Carter and Stinson [4]. Universal hashing has many important applications in the oretical computer science. In 1981, Wegman and Carter pioneered the study of applying universal hash function in the construction of message authentication codes (MAC) when they published. The universal hashing can be defined as a mapping from a finite set A with size a to a finite set B with size b[5].

ϵ-almost ∆-universal (ϵ-A∆U)[edit]

Definition of n ϵ-almost ∆-universal (ϵ-A∆U)[edit]

Let (B,+) be an Abelian group. A family H of hash functions that maps from a set A to a set B is said to be ϵ-almost ∆-universal (ϵ-A∆U) w.r.t. (B,+), if for any distinct elements a, a^' \in A and for all \delta \in B :

Pr_{h \in H}[h(a)-h(a^')=\delta] \le \epsilon

H is ∆-universal (∆U) if  \epsilon = \frac {1}{\left\vert B \right\vert}.

ϵ-almost universal family or (ϵ-AU) family[edit]

An ϵ-almost universal family or (ϵ-AU) family is one type of family in the universal hash function.

Definition of (ϵ-AU)family[edit]

Let ϵ be any positive real number. An ϵ-almost universal (ϵ-AU) family H of hash functions mapping from a set A to a set B is a family of functions from A to B, such that for any distinct elements a, a^' \in A :

Pr_{h \in H}[h(a)=h(a^')] \le \epsilon

H is universal (U) if \epsilon = \frac {1}{\left\vert B \right\vert}.

The definition above states that the probability of a collision is at most ϵ for any two distinct inputs. In the case \epsilon = \frac {1}{\left\vert B \right\vert} is called universal, the smallest possible value for \epsilon =\frac {a-b}{b(a-1)}

ϵ-almost strongly-universal family or (ϵ-ASU)family[edit]

An ϵ-almost strongly universal family or (ϵ-ASU)family is one type of family in the universal hash function.

Definition of (ϵ-ASU)family[edit]

Let ϵ be any positive real number. An ϵ-almost strongly-universal (ϵ-ASU) family H of Hash functions maps from a set A to a set B is a family of functions from A to B, such that for any distinct elements  a, a^'\in A and all  b, b^'\in B:

Pr_{h \in H}[h(a)=b] = \frac {1}{\left\vert B \right\vert}

and Pr_{h \in H}[h(a)=b, h(a')=b'] = \frac {\epsilon}{\left\vert B \right\vert}

H is strongly universal (SU) if \epsilon = \frac {1}{\left\vert B \right\vert}.

The first condition states that the probability that a given input a is mapped to a given output b equals \frac {1}{\left\vert B \right\vert}. The second condition implies that if a is mapped to b, then the conditional probability that a^' (a \ne a^') is mapped to b^' (b \ne b^') is upper bounded by ϵ.

MMH (Multilinear Modular Hashing)[edit]

The name MMH stands for Multilinear-Modular-Hashing. MMH is intended to hint to Multimedia applications, for example: to verify the integrity of an on-line multimedia title, and for the improved support of integer scalar-products in modern microprocessors which is a crucial factor for high performance of MMH.

MMH uses single precision scalar-products as its most basic operation. It is consist of a (modified) inner product between message and key modulo a prime p. The construction of MMH works in the finite field F_p for some prime integer p.


MMH* involves a construction of a family of hash functions consisting of all multilinear functions on F_p^k for some positive integer k. The family MMH* of functions from F_p^k to F_p is defined as follows.

MMH* = { g_x  :  F_p^k  F_p |  x ϵ  F_p^k }

where x, m are vectors, and the functions g_x are defines as follows.

\!g_x(m) =  m x\mod p =  \sum_{i=1}^{n} m_i\,x_i\mod p

In the case of MAC, m is a message and  x is a key where  m = (m_1,\cdots,m_k) and x = (x_1,\cdots,x_k), x_i, m_i \in \!F_p.

MMH* works with the same basic rules of MAC, so let say Ana and Bob want to communicate in an authenticated way. They have a secret key x. Say Charles listens to the conversation between Ana and Bob and wants to change the message into his own message to Bob which should pass as a message from Ana. So, his message m' and Ana's message m will differ in at least one bit (eg.  m_1 \ne m_1^' ).

Assume that Charles knows the hash function  g_x (m) and he knows Ana's message  m then the probability that Charles can change the message or send his own message can be explained by the following theorem.

Theorem[2]:The family MMH* is ∆-universal.


Take a\in F_p, and let  m , m' be two different messages. Assume without loss of generality that  m_1 \ne m_1^' . Then for any choice of  x_2,x_3,...,x_s , there is

Pr_{x_1}[g_x (m)-g_x (m^')\equiv a \mod p] &= Pr_{x_1}[(m_1 x_1+m_2 x_2+ \cdots +m_k x_k )-(m'_1 x_1+m'_2 x_2+...+m'_k x_k )\equiv a \mod p]\\
&= Pr_{x_1}[(m_1-m'_1)x_1+(m_2-m'_2)x_2+ \cdots +(m_k-m'_k)x_k]\equiv a \mod p]\\
&= Pr_{x_1}[(m_1-m'_1)x_1+\textstyle \sum_{k=2}^s(m_k-m'_k)x_k\equiv a \mod p]\\
&= Pr_{x_1}[(m_1-m'_1)x_1\equiv a - \textstyle \sum_{k=2}^s(m_k-m'_k)x_k \mod p]\\
&=\frac {1}{p}

To make a simple analog to explain the theorem above, take  F_p for  p prime  F_p = \underbrace{\big\{ 0,1,\cdots,p-1 \big\}}_p and if one takes an element in  F_p , let say  x_1=0 then

 Pr_{{x_1}\in \!{F_p}}({x_1}=0)= \frac {1}{p}

So, what one actually needs to compute is

 Pr_{(x_1,\cdots,x_k)\in \!{F_p^k}} (g_x(m)\equiv g_x(m')\mod p)


Pr_{(x_1,\cdots,x_k)\in \!{F_p^k}}(g_x(m)\equiv g_x(m')\mod p) &= \sum_{(x_2,\cdots,x_k)\in \!{F_p^{k-1}}} Pr_{(x_2^
'\cdots,x_k^')\in \!{F_p^{k-1}}}({x_2 = x_2^'},\cdots,{x_k = x_k^'})\cdot Pr_{x_1\in \!F_p}(g_x(m)\equiv g_x(m')\mod p)\\ 
&= \sum_{(x_2,\cdots,x_k)\in \!{F_p^{k-1}}} \frac {1}{p^{k-1}} \cdot \frac {1}{p}\\
&=P^{k-1}\cdot \frac {1}{p^{k-1}} \cdot \frac {1}{p}\\
&=\frac {1}{p}

From the proof above, \frac{1}{p} is the collision probability of attacker to perform in p verification queries. To reduce the collision probability, it is needed to hash the message into n factors using n independent keys. So the collision probability becomes \frac{1}{p^n} for any integer value n. In this case the keys are increased with n times the number of key factors of a message so as the computational work and output are increased by n factors of a message.


Halevi and Krawczyk[2] construct a variant called MMH^*_{32}. The construction works with 32-bit integers and with the prime integer p=2^{32}+15. Actually the prime numbers can be chosen from any prime which satisfies 2^{32}<p<2^{32}+2^{16}. This idea is adopted from a suggestion by Carter and Wegman to use the primes 2^{16}+1 or 2^{31}-1.

MMH^*_{32} is defined as follows.

MMH^*_{32}=\big\{g_x (\big\{0,1\big\}^{32} )^k \big\} \to F_p \in (\big\{0,1\big\}^{32} )^k

Where \big\{0,1\big\}^{32} means \big\{0, 1, ..., 2^{32}-1\big\} (i.e., binary representation)

The functions g_x are defined as follows.

g_x (m)&\overset{\underset{\mathrm{def}}{}}{=} m \cdot x \mod (2^{32}+15)\\
&=\textstyle \sum_{i=1}^k m_i \cdot x_i \mod (2^{32}+15)


 x= (x_1,\cdots,x_k ), m=(m,\cdots,m_k )

Refering to the theorem.1, the collision probability is ϵ = 2^{-32}, and the family of MMH^*_{32} can be defined as ϵ-almost ∆ Universal with ϵ = 2^{-32}.

The value of k[edit]

The value of k that describes the length of the message and key vectors has two effects on the implementation namely:

  • Since the costly modular reduction over k is multiply and add operations, then increasing k should decrease the speed.
  • Since the key x consist of k 32-bit integers then increasing k will results in a longer key.


Below are the timing results for various implementations of MMH[2] in 1997, designed by Halevi and Krawczyk.

  • A 150 Mhz PowerPC 604 RISC machine running AIX
150 MHz PowerPC 604 Message in Memory Message in Cache
64-bit 390 Mbit/second 417 Mbit/second
32-bit output 597 Mbit/second 820 Mbit/second
  • A 150 MHz Pentium-Pro machine running Windows NT
150 MHz PowerPC 604 Message in Memory Message in Cache
64-bit 296 Mbit/second 356 Mbit/second
32-bit output 556 Mbit/second 813 Mbit/second
  • A 200 MHz Pentium-Pro machine running Linux
150 MHz PowerPC 604 Message in Memory Message in Cache
64-bit 380 Mbit/second 500 Mbit/second
32-bit output 645 Mbit/second 1080 Mbit/second


Badger is a Message Authentication Code (MAC) based on the universal hashing, developed by Boesgaard, Christensen, and Zenner[3]. It is constructed by strengthening the ∆-universal hash family MMH using an ϵ-almost strongly universal (ASU) hash function family after the application of ENH, where the value of ϵ is (2^{32}-5)[6]. Since Badger is one of the MAC functions based on the universal hash function approach, the conditions needed for the security of Badger is the same with those in the universal hash function such as UMAC.

The Badger MAC processes a message of length up to 2^{64}-1 bits into an authentication tag of length u\cdot32 bits, where 1\le u \le 5 . According to the security needs, user can choose the values of u, that is the number of parallel hash tree in Badger. One can choose larger values of u, but those values do not influence further the security of MAC. The algorithm uses a 128-bit key and the limited message length to be processed under this key is 2^{64}[7].

The key setup has to be run just once in order to run the Badger algorithm under a given key, so that the resulting internal state of the MAC can be saved to be used with any other message that will be processed later.


Hash families can be combined in order to obtain new hash families. For the ϵ-AU, ϵ-A∆U, and ϵ-ASU families, the latter are contained in the former. For instance, an A∆U family is also an AU family, an ASU is also an A∆U family, and so forth. On the other hand, a stronger family can be reduced to a weaker one, as long as a performance gain can be reached. A method to reduce ∆-universal hash function to universal hash functions will be described as follows.


Let H^\triangle be an ϵ-AΔU hash family from a set A to a set B. Consider a message (m, m_b) \in A \times B . Then the family H consisting of the functions h(m,m_b) = H^\triangle (m) + m_b is ϵ-AU.

If  m \ne m^', then this probability is at most ϵ, since H^\triangle is an ϵ-A∆U family. If  m \ne m^' but  m_b=m_b^', then the probability is trivially 0. The proof for Theorem was described in [1]

ENH- family is a very fast universal hash family is the NH family used in UMAC:

NH_K (M)= \sum_{i=1}^ \frac{l}{2} (k_{(2i-1)} +_w m_{(2i-1)})\times (k_{2i} +_w m_{2i} )  \mod 2^{2w}

Where ‘+_w’ means ‘addition modulo 2^w’, and m_i,k_i \in \big\{0,\cdots, 2^w-1\big\}. It is a 2^{-w}-A∆U hash family.


The following version of NH is 2^{-w}-A∆U:

NH_K (M)=(k_1 +_w m_1 )\times(k_2 +_w m_2 )  \mod 2^{2w}

The proof for lemma.1 was described in[1]

Choosing w=32 and applying Theorem.1, One can obtain the 2^{-32}-AU function family ENH, which will be the basic building block of MAC:

ENH_{k_1,k_2} (m_1,m_2,m_3,m_4 )=(m_1 +_{32} k_1)(m_2 +_{32} k_2) +_{64} m_3 +_{64} 2^{32} m_4

where all arguments are 32-bit and the output is 64-bit.


Badger which is constructed using the strongly universality hash family can be describe as

\mathcal{H}=H^* \times F[3]

Where an \epsilon_{H^*}-AU universal function family H* to hash messages of all sizes onto a fixed size and \epsilon_{F}-ASU function family F to guarantee for the strong universality of the overall construction. NH and ENH are used to construct H*. The maximum input size of the function family H* is 2^{64}-1 and the output size is 128 bit, 64 bit each for the message and the hash. The collsion probability for the H*-function ranges from 2^{-32} to 2^{-26.14}. Then to construct the strongly universal function family F, ∆-universal hash family, MMH*, is transformed into a strongly universal hash family by adding an additional key.

Two steps on Badger[edit]

There are two steps that have to be executed for every message: processing phase and finalize phase.

Processing phase[7][edit]

In this phase, the data is hashed onto a 64-bit string. A core function h : \big\{0,1\big\}^{64}\times \big\{0,1\big\}^{128} \to \big\{0,1\big\}^{64} is used in this processing phase, that hashes a 128-bit string  m_2 \parallel m_1 under a 64-bit string  h( k, m_2, m_1 ) as follows:

 h(k, m_2, m_1 )= (L(m_1 ) +_{32} L(k) )\cdot(U(m_1 ) +_{32} U(k) ) +_{64} m_2

for any n, +_n means addition modulo 2^n. Given a 2n-bit string x, L(x) means least significant n bits, and U(x) means most significant n bits.

A message can be proceed by using this function and by denoting level_key [j][i] by k_j^i.

Pseudo-code of the processing phase is as follow.

if L=0
Go to finalization
r=L mod 64
if r≠0:
for i=1 to u:
v^'=max{1,⌈log_2 L⌉-6}
for j=1 to v^':
divide M^i into 64-bit blocks, M^i=m_t^i∥⋯∥m_1^i
if t is even:
M^i=h(k_j^i,m_t^i,m_(t-1)^i )∥⋯∥h(k_j^i,m_2^i,m_1^i )
M^i=m_t^i∥h(k_j^i,m_(t-1)^i,m_(t-2)^i )∥⋯∥h(k_j^i,m_2^i,m_1^i )

Finalize phase[7][edit]

In this phase, the 64-string hashed in the processing phase is transformed into the desired MAC tag. This finalization phase requires the Rabbit (cipher) stream cipher and uses both key setup and IV setup by denoting the finalization key final_key[j][i] by k_j^i.

Pseudo-code of the finalization phase

for i=1 to u:
divide Q^i into 27-bit blocks, Q^i=q_5^i∥⋯∥q_1^i
S^i=(∑_(j=1)^5 (q_j^i K_j^i))+K_6^i mod p
S=S ⨁ RabbitNextbit(u∙32)
return S


From the pseudocode above, k denotes key in Rabbit Key Setup(K) which initializes Rabbit with the 128-bit key K. M denotes as message to be hashed and |M| denotes the length of a message in bits. q_i denotes a message M that is divided into i blocks. For the given 2n-bit string x then L(x)and U(x) respectively denoted its least significant n bits and most significant n bits.


Boesgard, Christensen and Zenner recorded the performance of the Badger algorithm which measured on a 1.0 GHz Pentium III and on a 1.7 Ghz Pentium 4 processor[3]. The speed-optimized versions were programmed in assembly language inlined in C and compiled using the Intel C++ 7.1 compiler.

Badger properties for various restricted message lengths. “Memory req.”denotes the amount of memory required to store the internal state including key material and the inner state of the Rabbit (cipher) stream cipher . “Setup” denotes the key setup, and “Fin.” denotes finalization with IV-setup.

Max. Message Size Forgery Bound Memory Reg. Setup Pentium III Fin. Pentium III Setup Pentium III Fin. Pentium III
2^{11} bytes (e.g.IPsec) 2^{-57.7} 400 bytes 1133 cycles 409 cycles 1774 cycles 776 cycles
2^{15} bytes (e.g.TLS) 2^{-56.6} 528 bytes 1370 cycles 421 cycles 2100 cycles 778 cycles
2^{32} bytes 2^{-54.2} 1072 bytes 2376 cycles 421 cycles 3488 cycles 778 cycles
2^{61}-1 bytes 2^{-52.2} 2000 bytes 4093 cycles 433 cycles 5854 cycles 800 cycles

see also[edit]

External link[edit]


  1. ^ Carter, Larry; Wegman, Mark N (1981). "New hash functions and their use in authentication and set equality". 
  2. ^ a b c d e Halevi, Shai; Krawczyk, Hugo (1997). "MMH:Software Message Authentication in the Gbit/second rates". 
  3. ^ a b c d e f g Boesgaard, Martin; Scavenius, Ove; Pedersen, Thomas; Christensen, Thomas; Zenner, Eric (2005). "Badger- A fast and provably secure MAC" (PDF). 
  4. ^ Stinson, D.R (2003). "Universal hashing and authentication code". 
  5. ^ Nevelsteen; Preneel, Bart (1999). "Software Performance of Universal Hash Functions" (PDF).  Unknown parameter |firs1= ignored (help)
  6. ^ Lucks; Rijment, Vincent (2005). "Evaluation of Badger" (PDF).  Unknown parameter |firs1= ignored (help)
  7. ^ a b c "Badger Message Authentication Code,Algorithm Specification" (PDF). 2005.