= Grain 128a =

The Grain 128a stream cipher was first purposed at Symmetric Key Encryption Workshop (SKEW) in 2011 as an improvement of the predecessor Grain 128, which added security enhancements and optional message authentication using the Encrypt & MAC approach. One of the important features of the Grain family is that the throughput can be increased at the expense of additional hardware. Grain 128a is designed by Martin Ågren, Martin Hell, Thomas Johansson and Willi Meier.

== Description of the cipher ==

Grain 128a consists of two large parts: Pre-output function and MAC. The pre-output function has an internal state size of 256 bits, consisting of two registers of size 128 bit: NLFSR and LFSR. The MAC supports variable tag lengths w such that $0<w\leq32$. The cipher uses a 128 bit key.

The cipher supports two modes of operation: with or without authentication, which is configured via the supplied $IV_0$ such that if $IV_0=1$ then authentication of the message is enabled, and if $IV_0=0$ authentication of the message is disabled.

== Pre-output function ==

The pre-output function consists of two registers of size 128 bit: NLFSR ($b$) and LFSR ($s$) along with 2 feedback polynomials $f$ and $g$ and a boolean function $h$.

$f(x)=1+x^{32}+x^{47}+x^{58}+x^{90}+x^{121}+x^{128}$

$g(x)=1+x^{32}+x^{37}+x^{72}+x^{102}+x^{128}+x^{44}x^{60}+x^{61}x^{125}+x^{63}x^{67}x^{69}x^{101}+x^{80}x^{88}+x^{110}x^{111}+x^{115}x^{117}+x^{46}x^{50}x^{58}+x^{103}x^{104}x^{106}+x^{33}x^{35}x^{36}x^{40}$

$h(x)=b_{i+12}s_{i+8}+s_{i+13}s_{i+20}+b_{i+95}s_{i+42}+s_{i+60}s_{i+79}+b_{i+12}b_{i+95}s_{i+94}$

In addition to the feedback polynomials, the update functions for the NLFSR and the LFSR are:

$b_{i+128}=s_i+b_{i}+b_{i+26}+b_{i+56}+b_{i+91}+b_{i+96}+b_{i+3}b_{i+67}+b_{i+11}b_{i+13}+b_{i+17}b_{i+18}+b_{i+27}b_{i+59}+b_{i+40}b_{i+48}+b_{i+61}b_{i+65}+b_{i+68}b_{i+84}+b_{i+88}b_{i+92}b_{i+93}b_{i+95}+b_{i+22}b_{i+24}b_{i+25}+b_{i+70}b_{i+78}b_{i+82}$

$s_{i+128}=s_i+s_{i+7}+s_{i+38}+s_{i+70}+s_{i+81}+s_{i+96}$

The pre-output stream ($y$) is defined as:

$y_i=h(x)+s_{i+93}+b_{i+2}+b_{i+15}+b_{i+36}+b_{i+45}+b_{i+64}+b_{i+73}+b_{i+89}$

=== Initialisation ===

Upon initialisation we define an $IV$ of 96 bit, where the $IV_0$ dictates the mode of operation.

The LFSR is initialised as:

$s_i = IV_i$ for $0 \leq i \leq 95$

$s_i = 1$ for $96 \leq i \leq 126$

$s_{127} = 0$

The last 0 bit ensures that similar key-IV pairs do not produce shifted versions of each other.

The NLFSR is initialised by copying the entire 128 bit key ($k$) into the NLFSR:

$b_i = k_i$ for $0 \leq i \leq 127$

=== Start up clocking ===

Before the pre-output function can begin to output its pre-output stream it has to be clocked 256 times to warm up, during this stage the pre-output stream is fed into the feedback polynomials $g$ and $f$.

== Key stream ==

The key stream ($z$) and MAC functionality in Grain 128a both share the same pre-output stream ($y$). As authentication is optional our key stream definition depends upon the $IV_0$.

When authentication is enabled, the MAC functionality uses the first $2w$ bits (where $w$ is the tag size) after the start up clocking to initialise. The key stream is then assigned every other bit due to the shared pre-output stream.

If authentication is enabled:

$z_i = y_{2w+2i}$

If authentication is disabled:

$z_i = y_i$

== MAC ==

Grain 128a supports tags of size $w$ up to 32 bit, to do this 2 registers of size $w$ is used, a shift register($r$) and an accumulator($a$). To create a tag of a message $m$ where $L$ is the length of $m+1$ as we have to set $m_L = 1$ to ensure that i.e. $m1 = 1$ and $m2 = 10$ has different tags, and also making it impossible to generate a tag that completely ignores the input from the shift register after initialisation.

For each bit $0 \leq j \leq 31$ in the accumulator we at time $0 \leq i \leq L$ we denounce a bit in the accumulator as $a_{i}^{j}$.

=== Initialisation ===

When authentication is enabled Grain 128a uses the first $2w$ bits of the pre-output stream($y$) to initialise the shift register and the accumulator. This is done by:

Shift register:

$r_i = y_{i+31}$ for $0\leq i \leq 31$

Accumulator:

$a_0^j = y_j$ for $0 \leq j \leq 31$

=== Tag generation ===

Shift register:

The shift register is fed all the odd bits of the pre-output stream($y$):

$r_{i+31} = y_{64+2i+1}$

Accumulator:

$a_{i+1}^j = a_i^j + m_i r_{i+j}$ for $0 \leq i \leq L$

=== Final tag ===

When the cipher has completed the L iterations the final tag($t$) is the content of the accumulator:

$t_i = a_{L+1}^i$ for $0 \leq i \leq 31$
