# Elliptic curve point multiplication

Elliptic curve point multiplication is the operation of successively adding a point along an elliptic curve to itself repeatedly. It is used in elliptic curve cryptography (ECC) as a means of producing a trapdoor function. The literature presents this operation as scalar multiplication, thus the most common name is "elliptic curve scalar multiplication", as written in Hessian form of an elliptic curve.

## Basics

Given a curve ξ defined along some equation in a finite field (such as ξ : y2 = x3 + ax + b), point multiplication is defined as the repeated addition of a point along that curve. Denote as nP = P + P + P + ... + P for some scalar (integer) n and a point P = (x, y) that lies on the curve ξ. This type of curve is known as a Weierstrass curve.

The security of modern ECC depends on the intractability of determining n from $Q = nP$ given known values of Q and P. It is known as the elliptic curve discrete logarithm problem.

Point addition is defined as taking two points along a curve E and computing where a line through them intersects the curve. The negative of the intersection point is used as the result of the addition.

The operation is denoted by $P + Q = R$, or $(x_p, y_p) + (x_q, y_q) = (x_r, y_r)$. This can algebraically be calculated by:

\begin{align} \lambda &= \frac{y_q-y_p}{x_q-x_p}\\ x_r &= \lambda^2 - x_p - x_q\\ y_r &= \lambda(x_p - x_r) - y_p \end{align}

Note that we assume that the elliptic curve is given by y2 = x3 + ax + b.

## Point doubling

Point doubling is similar to point addition, except one takes the tangent of a single point and finds the intersection with the tangent line.

\begin{align} \lambda &= \frac{3x_p^2 + a}{2y_p}\\ x_r &= \lambda^2 - 2x_p\\ y_r &= \lambda(x_p - x_r) - y_p \end{align}

Where a is the multiplication factor of x in the elliptic curve equation y2 = x3 + ax + b. Note that only λ has changed with respect to the point addition problem.

## Point multiplication

The straightforward way of computing a point multiplication is through repeated addition. However, this is a fully exponential approach to computing the multiplication.

The simplest method is the double-and-add method, similar to multiply-and-square in modular exponentiation. The algorithm works as follows:

To compute dP, start with the binary representation for d: $d = d_0 + 2d_1 + 2^2d_2 + \cdots + 2^md_m$, where [$d_0 .. d_m$] ∈ {0,1}

  * Q := 0
* for i from m to 0 do
* Q := 2Q (using point doubling)
* if di = 1 then Q := Q + P (using point addition)
* Return Q


An alternative way of writing the above as a recursive function is

f(P, n) is
if n = 0 then return 0           # computation complete
else if n mod 2 = 1 then
P + f(P, n-1)                 # addition when n is odd
else
f(2P, n/2)                    # doubling when n is even


where f is the function for doubling, P is the coordinate to double, n is the number of times to double the coordinate. Example: 100P can be written as 2(2(P+2(2(2(P+2P))))) and thus requires six doublings and two additions. 100P would be equal to f(P,100).

This algorithm requires log2(n) iterations of point doubling and addition to compute the full-point multiplication. There are many variations of this algorithm such as using a window, sliding window, NAF, NAF-w, vector chains, and Montgomery ladder.

### Windowed method

In the windowed version of this algorithm, one selects a window size w and computes all $2^w$ values of $dP$ for $d = 0, 1, 2, \dots, 2^w - 1$. The algorithm now uses the representation $d = d_0 + 2^wd_1 + 2^{2w}d_2 + \cdots + 2^{mw}d_m$ and becomes

  * Q = 0
* for i from m to 0 do
* Q := 2wQ (using repeated point doubling)
* if di > 0 then Q := Q + diP (using a single point addition with the pre-computed value of diP)
* Return Q


This algorithm has the same complexity as the double-and-add approach with the benefit of using fewer point additions (which in practice are slower than doubling). Typically, the value of w is chosen to be fairly small making the pre-computation stage a trivial component of the algorithm. For the NIST recommended curves, $w = 4$ is usually the best selection. The entire complexity for a n-bit number is measured as $n+1$ point doubles and $2^w - 2 + {n \over w}$ point additions.

### Sliding-window method

In the sliding-window version, we look to trade off point additions for point doubles. We compute a similar table as in the windowed version except we only compute the points $dP$ for $d = 2^{w-1}, 2^{w-1}+1, \dots, 2^w - 1$. Effectively, we are only computing the values for which the most significant bit of the window is set. The algorithm then uses the original double-and-add representation of $d = d_0 + 2d_1 + 2^2d_2 + \cdots + 2^md_m$.

  * Q = 0
* for i from 0 to m do
* if di = 0 then
* Q := 2Q (point double)
* else
* Grab up to w - 1 additional bits from d to store into (including di) t and decrement i suitably
* If fewer than w bits were grabbed
* Return Q
* else
* Q := 2wQ (repeated point double)
* Q := Q + tP (point addition)
* Return Q


This algorithm has the benefit that the pre-computation stage is roughly half as complex as the normal windowed method while also trading slower point additions for point doublings. In effect, there is little reason to use the windowed method over this approach. The algorithm requires $w - 1 + n$ point doubles and at most $2^{w-1} - 1 + {n \over w}$ point additions.

### wNAF method

In the Non-Adjacent Form we aim to make use of the fact that point subtraction is just as easy as point addition to perform fewer (of either) as compared to a sliding-window method. The NAF of the multiplicand $d$ must be computed first with the following algorithm

   * i = 0
* While (d > 0) do
* if (d mod 2) == 1 then
* di = d mods 2w
* d = d - di
* else
* di = 0
* d = d/2
* i = i + 1
* return (di-1, di-2, ..., d0)


Where the mods function is defined as

   * if (d mod 2w) >= 2w-1
* return (d mod 2w) - 2w
* else
* return d mod 2w


This produces the NAF needed to now perform the multiplication. This algorithm requires the pre-computation of the points $\lbrace 1, 3, 5, \dots, 2^{w-1}-1 \rbrace P$ and their negatives, where $P$ is the point to be multiplied. On typical Weierstrass curves, if $P = \lbrace x, y \rbrace$ then $-P = \lbrace x, -y \rbrace$. So in essence the negatives are cheap to compute. Next, the following algorithm computes the multiplication $dP$:

   * Q = 0
* for j = i-1 downto 0 do
* Q = 2Q
* if (dj != 0)
* Q = Q + djG
* return Q


The wNAF guarantees that on average there will be a density of $1 \over {w + 1}$ point additions (slightly better than the unsigned window). It requires 1 point doubling and $2^{w-2}-1$ point additions for precomputation. The algorithm then requires $n$ point doublings and $n \over {w + 1}$ point additions for the rest of the multiplication.

One property of the NAF is that we are guaranteed that every non-zero element $d_i$ is followed by at least $w-1$ additional zeroes. This is because the algorithm clears out the lower $w$ bits of $d$ with every subtraction of the output of the mods function. This observation can be used for several purposes. After every non-zero element the additional zeroes can be implied and do not need to be stored. Secondly, the multiple serial divisions by 2 can be replaced by a division by $2^w$ after every non-zero $d_i$ element and divide by 2 after every zero.

The Montgomery ladder approach computes the point multiplication in a fixed amount of time. This can be beneficial when timing or power consumption measurements are exposed to an attacker performing a side channel attack. The algorithm uses the same representation as from double-and-add.

  * R0 := 0
* R1 := P
* for i from m downto 0 do
* if di = 0 then
* R1 := R0 + R1
* R0 := 2R0
* else
* R0 := R0 + R1
* R1 := 2R1
* Return R0


This algorithm is in effect the same speed as the double-and-add approach except that it computes the same number of point additions and doubles regardless of the value of the multiplicand d. This means that at this level the algorithm does not leak any information through timing or power. However, it was shown by Yuval Yarom and Naomi Benger that through application of FLUSH+RELOAD side channel attack the full private key can be revealed in only one multiplication operation.

## References

Yuval Yarom and Naomi Benger. Recovering OpenSSL ECDSA Nonces Using the FLUSH+RELOAD Cache Side-channel Attack. Cryptology ePrint Archive, Report 2014/140, 2014. http://eprint.iacr.org/2014/140.pdf