# Block Wiedemann algorithm

The block Wiedemann algorithm for computing kernel vectors of a matrix over a finite field is a generalisation of an algorithm due to Don Coppersmith.

## Coppersmith's algorithm

Let ${\displaystyle M}$ be an ${\displaystyle n\times n}$ square matrix over some finite field F, let ${\displaystyle x_{\mathrm {base} }}$ be a random vector of length ${\displaystyle n}$, and let ${\displaystyle x=Mx_{\mathrm {base} }}$. Consider the sequence of vectors ${\displaystyle S=\left[x,Mx,M^{2}x,\ldots \right]}$ obtained by repeatedly multiplying the vector by the matrix ${\displaystyle M}$; let ${\displaystyle y}$ be any other vector of length ${\displaystyle n}$, and consider the sequence of finite-field elements ${\displaystyle S_{y}=\left[y\cdot x,y\cdot Mx,y\cdot M^{2}x\ldots \right]}$

We know that the matrix ${\displaystyle M}$ has a minimal polynomial; by the Cayley–Hamilton theorem we know that this polynomial is of degree (which we will call ${\displaystyle n_{0}}$) no more than ${\displaystyle n}$. Say ${\displaystyle \sum _{r=0}^{n_{0}}p_{r}M^{r}=0}$. Then ${\displaystyle \sum _{r=0}^{n_{0}}y\cdot (p_{r}(M^{r}x))=0}$; so the minimal polynomial of the matrix annihilates the sequence ${\displaystyle S}$ and hence ${\displaystyle S_{y}}$.

But the Berlekamp–Massey algorithm allows us to calculate relatively efficiently some sequence ${\displaystyle q_{0}\ldots q_{L}}$ with ${\displaystyle \sum _{i=0}^{L}q_{i}S_{y}[{i+r}]=0\;\forall \;r}$. Our hope is that this sequence, which by construction annihilates ${\displaystyle y\cdot S}$, actually annihilates ${\displaystyle S}$; so we have ${\displaystyle \sum _{i=0}^{L}q_{i}M^{i}x=0}$. We then take advantage of the initial definition of ${\displaystyle x}$ to say ${\displaystyle M\sum _{i=0}^{L}q_{i}M^{i}x_{\mathrm {base} }=0}$ and so ${\displaystyle \sum _{i=0}^{L}q_{i}M^{i}x_{\mathrm {base} }}$ is a hopefully non-zero kernel vector of ${\displaystyle M}$.

## The block Wiedemann algorithm

The natural implementation of sparse matrix arithmetic on a computer makes it easy to compute the sequence S in parallel for a number of vectors equal to the width of a machine word – indeed, it will normally take no longer to compute for that many vectors than for one. If you have several processors, you can compute the sequence S for a different set of random vectors in parallel on all the computers.

It turns out, by a generalization of the Berlekamp–Massey algorithm to provide a sequence of small matrices, that you can take the sequence produced for a large number of vectors and generate a kernel vector of the original large matrix. You need to compute ${\displaystyle y_{i}\cdot M^{t}x_{j}}$ for some ${\displaystyle i=0\ldots i_{\max },j=0\ldots j_{\max },t=0\ldots t_{\max }}$ where ${\displaystyle i_{\max },j_{\max },t_{\max }}$ need to satisfy ${\displaystyle t_{\max }>{\frac {d}{i_{\max }}}+{\frac {d}{j_{\max }}}+O(1)}$ and ${\displaystyle y_{i}}$ are a series of vectors of length n; but in practice you can take ${\displaystyle y_{i}}$ as a sequence of unit vectors and simply write out the first ${\displaystyle i_{\max }}$ entries in your vectors at each time t.

## References

Villard's 1997 research report 'A study of Coppersmith's block Wiedemann algorithm using matrix polynomials' (the cover material is in French but the content in English) is a reasonable description.

Thomé's paper 'Subquadratic computation of vector generating polynomials and improvement of the block Wiedemann algorithm' uses a more sophisticated FFT-based algorithm for computing the vector generating polynomials, and describes a practical implementation with imax = jmax = 4 used to compute a kernel vector of a 484603×484603 matrix of entries modulo 2607−1, and hence to compute discrete logarithms in the field GF(2607).