Strassen algorithm

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In the mathematical discipline of linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm used for matrix multiplication. It is faster than the standard matrix multiplication algorithm and is useful in practice for large matrices, but would be slower than the fastest known algorithm for extremely large matrices.


Volker Strassen published the Strassen algorithm in 1969. Although his algorithm is only slightly faster than the standard algorithm for matrix multiplication, he was the first to point out that the standard approach is not optimal. His paper started the search for even faster algorithms such as the more complex Coppersmith–Winograd algorithm published in 1987.


The left column represents 2x2 matrix multiplication. Naïve matrix multiplication requires one multiplication for each "1" of the left column. Each of the other columns represents a single one of the 7 multiplications in the algorithm, and the sum of the columns gives the full matrix multiplication on the left.

Let A, B be two square matrices over a ring R. We want to calculate the matrix product C as

\mathbf{C} = \mathbf{A} \mathbf{B} \qquad \mathbf{A},\mathbf{B},\mathbf{C} \in R^{2^n \times 2^n}

If the matrices A, B are not of type 2n x 2n we fill the missing rows and columns with zeros.

We partition A, B and C into equally sized block matrices

\mathbf{A} =
\mathbf{A}_{1,1} & \mathbf{A}_{1,2} \\
\mathbf{A}_{2,1} & \mathbf{A}_{2,2}
\mbox { , }
\mathbf{B} =
\mathbf{B}_{1,1} & \mathbf{B}_{1,2} \\
\mathbf{B}_{2,1} & \mathbf{B}_{2,2}
\mbox { , }
\mathbf{C} =
\mathbf{C}_{1,1} & \mathbf{C}_{1,2} \\
\mathbf{C}_{2,1} & \mathbf{C}_{2,2}


\mathbf{A}_{i,j}, \mathbf{B}_{i,j}, \mathbf{C}_{i,j} \in R^{2^{n-1} \times 2^{n-1}}


\mathbf{C}_{1,1} = \mathbf{A}_{1,1} \mathbf{B}_{1,1} + \mathbf{A}_{1,2} \mathbf{B}_{2,1}
\mathbf{C}_{1,2} = \mathbf{A}_{1,1} \mathbf{B}_{1,2} + \mathbf{A}_{1,2} \mathbf{B}_{2,2}
\mathbf{C}_{2,1} = \mathbf{A}_{2,1} \mathbf{B}_{1,1} + \mathbf{A}_{2,2} \mathbf{B}_{2,1}
\mathbf{C}_{2,2} = \mathbf{A}_{2,1} \mathbf{B}_{1,2} + \mathbf{A}_{2,2} \mathbf{B}_{2,2}

With this construction we have not reduced the number of multiplications. We still need 8 multiplications to calculate the Ci,j matrices, the same number of multiplications we need when using standard matrix multiplication.

Now comes the important part. We define new matrices

\mathbf{M}_{1} := (\mathbf{A}_{1,1} + \mathbf{A}_{2,2}) (\mathbf{B}_{1,1} + \mathbf{B}_{2,2})
\mathbf{M}_{2} := (\mathbf{A}_{2,1} + \mathbf{A}_{2,2}) \mathbf{B}_{1,1}
\mathbf{M}_{3} := \mathbf{A}_{1,1} (\mathbf{B}_{1,2} - \mathbf{B}_{2,2})
\mathbf{M}_{4} := \mathbf{A}_{2,2} (\mathbf{B}_{2,1} - \mathbf{B}_{1,1})
\mathbf{M}_{5} := (\mathbf{A}_{1,1} + \mathbf{A}_{1,2}) \mathbf{B}_{2,2}
\mathbf{M}_{6} := (\mathbf{A}_{2,1} - \mathbf{A}_{1,1}) (\mathbf{B}_{1,1} + \mathbf{B}_{1,2})
\mathbf{M}_{7} := (\mathbf{A}_{1,2} - \mathbf{A}_{2,2}) (\mathbf{B}_{2,1} + \mathbf{B}_{2,2})

only using 7 multiplications (one for each Mk) instead of 8. We may now express the Ci,j in terms of Mk, like this:

\mathbf{C}_{1,1} = \mathbf{M}_{1} + \mathbf{M}_{4} - \mathbf{M}_{5} + \mathbf{M}_{7}
\mathbf{C}_{1,2} = \mathbf{M}_{3} + \mathbf{M}_{5}
\mathbf{C}_{2,1} = \mathbf{M}_{2} + \mathbf{M}_{4}
\mathbf{C}_{2,2} = \mathbf{M}_{1} - \mathbf{M}_{2} + \mathbf{M}_{3} + \mathbf{M}_{6}

We iterate this division process n times (recursively) until the submatrices degenerate into numbers (elements of the ring R). The resulting product will be padded with zeroes just like A and B, and should be stripped of the corresponding rows and columns.

Practical implementations of Strassen's algorithm switch to standard methods of matrix multiplication for small enough submatrices, for which those algorithms are more efficient. The particular crossover point for which Strassen's algorithm is more efficient depends on the specific implementation and hardware. Earlier authors had estimated that Strassen's algorithm is faster for matrices with widths from 32 to 128 for optimized implementations.[1] However, it has been observed that this crossover point has been increasing in recent years, and a 2010 study found that even a single step of Strassen's algorithm is often not beneficial on current architectures, compared to a highly optimized traditional multiplication, until matrix sizes exceed 1000 or more, and even for matrix sizes of several thousand the benefit is typically marginal at best (around 10% or less).[2]

Asymptotic complexity[edit]

The standard matrix multiplication takes approximately 2N3 (where N = 2n) arithmetic operations (additions and multiplications); the asymptotic complexity is O(N3).

The number of additions and multiplications required in the Strassen algorithm can be calculated as follows: let f(n) be the number of operations for a 2n × 2n matrix. Then by recursive application of the Strassen algorithm, we see that f(n) = 7f(n-1) + l4n, for some constant l that depends on the number of additions performed at each application of the algorithm. Hence f(n) = (7 + o(1))n, i.e., the asymptotic complexity for multiplying matrices of size N = 2n using the Strassen algorithm is

O([7+o(1)]^n) = O(N^{\log_{2}7+o(1)}) \approx O(N^{2.8074}).

The reduction in the number of arithmetic operations however comes at the price of a somewhat reduced numerical stability,[3]and the algorithm also requires significantly more memory compared to the naive algorithm. Both initial matrices must have their dimensions expanded to the next power of 2, which results in storing up to four times as many elements, and the seven auxiliary matrices each contain a quarter of the elements in the expanded ones.

Rank or bilinear complexity[edit]

The bilinear complexity or rank of a bilinear map is an important concept in the asymptotic complexity of matrix multiplication. The rank of a bilinear map \phi:\mathbf A \times \mathbf B \rightarrow \mathbf C over a field F is defined as (somewhat of an abuse of notation)

R(\phi/\mathbf F) = \min \left\{r\left|\exists f_i\in \mathbf A^*,g_i\in\mathbf B^*,w_i\in\mathbf C , \forall \mathbf a\in\mathbf A, \mathbf b\in\mathbf B, \phi(\mathbf a,\mathbf b) = \sum_{i=1}^r f_i(\mathbf a)g_i(\mathbf b)w_i \right.\right\}

In other words, the rank of a bilinear map is the length of its shortest bilinear computation.[4] The existence of Strassen's algorithm shows that the rank of 2×2 matrix multiplication is no more than seven. To see this, let us express this algorithm (alongside the standard algorithm) as such a bilinear computation. In the case of matrices, the dual spaces A* and B* consist of maps into the field F induced by a scalar double-dot product, (i.e. in this case the sum of all the entries of a Hadamard product.)

Standard algorithm Strassen algorithm
i fi(a) gi(b) wi fi(a) gi(b) wi
1 \begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf a \begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf b \begin{bmatrix}1&0\\0&0\end{bmatrix} \begin{bmatrix}1&0\\0&1\end{bmatrix}:\mathbf a \begin{bmatrix}1&0\\0&1\end{bmatrix}:\mathbf b \begin{bmatrix}1&0\\0&1\end{bmatrix}
2 \begin{bmatrix}0&1\\0&0\end{bmatrix}:\mathbf a \begin{bmatrix}0&0\\1&0\end{bmatrix}:\mathbf b \begin{bmatrix}1&0\\0&0\end{bmatrix} \begin{bmatrix}0&0\\1&1\end{bmatrix}:\mathbf a \begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf b \begin{bmatrix}0&0\\1&-1\end{bmatrix}
3 \begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf a \begin{bmatrix}0&1\\0&0\end{bmatrix}:\mathbf b \begin{bmatrix}0&1\\0&0\end{bmatrix} \begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf a \begin{bmatrix}0&1\\0&-1\end{bmatrix}:\mathbf b \begin{bmatrix}0&1\\0&1\end{bmatrix}
4 \begin{bmatrix}0&1\\0&0\end{bmatrix}:\mathbf a \begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf b \begin{bmatrix}0&1\\0&0\end{bmatrix} \begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf a \begin{bmatrix}-1&0\\1&0\end{bmatrix}:\mathbf b \begin{bmatrix}1&0\\1&0\end{bmatrix}
5 \begin{bmatrix}0&0\\1&0\end{bmatrix}:\mathbf a \begin{bmatrix}1&0\\0&0\end{bmatrix}:\mathbf b \begin{bmatrix}0&0\\1&0\end{bmatrix} \begin{bmatrix}1&1\\0&0\end{bmatrix}:\mathbf a \begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf b \begin{bmatrix}-1&1\\0&0\end{bmatrix}
6 \begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf a \begin{bmatrix}0&0\\1&0\end{bmatrix}:\mathbf b \begin{bmatrix}0&0\\1&0\end{bmatrix} \begin{bmatrix}-1&0\\1&0\end{bmatrix}:\mathbf a \begin{bmatrix}1&1\\0&0\end{bmatrix}:\mathbf b \begin{bmatrix}0&0\\0&1\end{bmatrix}
7 \begin{bmatrix}0&0\\1&0\end{bmatrix}:\mathbf a \begin{bmatrix}0&1\\0&0\end{bmatrix}:\mathbf b \begin{bmatrix}0&0\\0&1\end{bmatrix} \begin{bmatrix}0&1\\0&-1\end{bmatrix}:\mathbf a \begin{bmatrix}0&0\\1&1\end{bmatrix}:\mathbf b \begin{bmatrix}1&0\\0&0\end{bmatrix}
8 \begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf a \begin{bmatrix}0&0\\0&1\end{bmatrix}:\mathbf b \begin{bmatrix}0&0\\0&1\end{bmatrix}
\mathbf a\mathbf b = \sum_{i=1}^8 f_i(\mathbf a)g_i(\mathbf b)w_i \mathbf a\mathbf b = \sum_{i=1}^7 f_i(\mathbf a)g_i(\mathbf b)w_i

It can be shown that the total number of elementary multiplications L required for matrix multiplication is tightly asymptotically bound to the rank R, i.e. L = \Theta(R), or more specifically, since the constants are known, \frac 1 2 R\le L\le R. One useful property of the rank is that it is submultiplicative for tensor products, and this enables one to show that 2n×2n×2n matrix multiplication can be accomplished with no more than 7n elementary multiplications for any n. (This n-fold tensor product of the 2×2×2 matrix multiplication map with itself—an nth tensor power—is realized by the recursive step in the algorithm shown.)

See also[edit]


  1. ^ Skiena, Steven S. (1998), "§8.2.3 Matrix multiplication", The Algorithm Design Manual, Berlin, New York: Springer-Verlag, ISBN 978-0-387-94860-7 .
  2. ^ P. D'Alberto and A. Nicolau, "Using recursion to boost ATLAS's performance," Lecture Notes in Computer Science, vol. 4759, pp. 142-151 (2010).
  3. ^ Webb, Miller (1975). "Computational complexity and numerical stability". SIAM J. Comput: 97-107. 
  4. ^ Burgisser, Clausen, and Shokrollahi. Algebraic Complexity Theory. Springer-Verlag 1997.

External links[edit]