Freivalds' algorithm (named after Rūsiņš Mārtiņš Freivalds ) is a probabilistic randomized algorithm used to verify matrix multiplication . Given three n × n matrices
A
{\displaystyle A}
,
B
{\displaystyle B}
, and
C
{\displaystyle C}
, a general problem is to verify whether
A
×
B
=
C
{\displaystyle A\times B=C}
. A naïve algorithm would compute the product
A
×
B
{\displaystyle A\times B}
explicitly and compare term by term whether this product equals
C
{\displaystyle C}
. However, the best known matrix multiplication algorithm runs in
O
(
n
2.3729
)
{\displaystyle O(n^{2.3729})}
time.[1] Freivalds' algorithm utilizes randomization in order to reduce this time bound to
O
(
n
2
)
{\displaystyle O(n^{2})}
[2]
with high probability . In
O
(
k
n
2
)
{\displaystyle O(kn^{2})}
time the algorithm can verify a matrix product with probability of failure less than
2
−
k
{\displaystyle 2^{-k}}
.
The algorithm [ edit ]
Three n × n matrices
A
{\displaystyle A}
,
B
{\displaystyle B}
, and
C
{\displaystyle C}
.
Yes, if
A
×
B
=
C
{\displaystyle A\times B=C}
; No, otherwise.
Procedure [ edit ]
Generate an n × 1 random 0/1 vector
r
→
{\displaystyle {\vec {r}}}
.
Compute
P
→
=
A
×
(
B
r
→
)
−
C
r
→
{\displaystyle {\vec {P}}=A\times (B{\vec {r}})-C{\vec {r}}}
.
Output "Yes" if
P
→
=
(
0
,
0
,
…
,
0
)
T
{\displaystyle {\vec {P}}=(0,0,\ldots ,0)^{T}}
; "No," otherwise.
If
A
×
B
=
C
{\displaystyle A\times B=C}
, then the algorithm always returns "Yes". If
A
×
B
≠
C
{\displaystyle A\times B\neq C}
, then the probability that the algorithm returns "Yes" is less than or equal to one half. This is called one-sided error .
By iterating the algorithm k times and returning "Yes" only if all iterations yield "Yes", a runtime of
O
(
k
n
2
)
{\displaystyle O(kn^{2})}
and error probability of
≤
1
/
2
k
{\displaystyle \leq 1/2^{k}}
is achieved.
Example [ edit ]
Suppose one wished to determine whether:
A
B
=
[
2
3
3
4
]
[
1
0
1
2
]
=
?
[
6
5
8
7
]
=
C
.
{\displaystyle AB={\begin{bmatrix}2&3\\3&4\end{bmatrix}}{\begin{bmatrix}1&0\\1&2\end{bmatrix}}{\stackrel {?}{=}}{\begin{bmatrix}6&5\\8&7\end{bmatrix}}=C.}
A random two-element vector with entries equal to 0 or 1 is selected – say
r
→
=
[
1
1
]
{\displaystyle {\vec {r}}={\begin{bmatrix}1\\1\end{bmatrix}}}
– and used to compute:
A
×
(
B
r
→
)
−
C
r
→
=
[
2
3
3
4
]
(
[
1
0
1
2
]
[
1
1
]
)
−
[
6
5
8
7
]
[
1
1
]
=
[
2
3
3
4
]
[
1
3
]
−
[
11
15
]
=
[
11
15
]
−
[
11
15
]
=
[
0
0
]
.
{\displaystyle {\begin{aligned}A\times (B{\vec {r}})-C{\vec {r}}&={\begin{bmatrix}2&3\\3&4\end{bmatrix}}\left({\begin{bmatrix}1&0\\1&2\end{bmatrix}}{\begin{bmatrix}1\\1\end{bmatrix}}\right)-{\begin{bmatrix}6&5\\8&7\end{bmatrix}}{\begin{bmatrix}1\\1\end{bmatrix}}\\&={\begin{bmatrix}2&3\\3&4\end{bmatrix}}{\begin{bmatrix}1\\3\end{bmatrix}}-{\begin{bmatrix}11\\15\end{bmatrix}}\\&={\begin{bmatrix}11\\15\end{bmatrix}}-{\begin{bmatrix}11\\15\end{bmatrix}}\\&={\begin{bmatrix}0\\0\end{bmatrix}}.\end{aligned}}}
This yields the zero vector, suggesting the possibility that AB = C. However, if in a second trial the vector
r
→
=
[
1
0
]
{\displaystyle {\vec {r}}={\begin{bmatrix}1\\0\end{bmatrix}}}
is selected, the result becomes:
A
×
(
B
r
→
)
−
C
r
→
=
[
2
3
3
4
]
(
[
1
0
1
2
]
[
1
0
]
)
−
[
6
5
8
7
]
[
1
0
]
=
[
−
1
−
1
]
.
{\displaystyle A\times (B{\vec {r}})-C{\vec {r}}={\begin{bmatrix}2&3\\3&4\end{bmatrix}}\left({\begin{bmatrix}1&0\\1&2\end{bmatrix}}{\begin{bmatrix}1\\0\end{bmatrix}}\right)-{\begin{bmatrix}6&5\\8&7\end{bmatrix}}{\begin{bmatrix}1\\0\end{bmatrix}}={\begin{bmatrix}-1\\-1\end{bmatrix}}.}
The result is nonzero, proving that in fact AB ≠ C.
There are four two-element 0/1 vectors, and half of them give the zero vector in this case (
r
→
=
[
0
0
]
{\displaystyle {\vec {r}}={\begin{bmatrix}0\\0\end{bmatrix}}}
and
r
→
=
[
1
1
]
{\displaystyle {\vec {r}}={\begin{bmatrix}1\\1\end{bmatrix}}}
), so the chance of randomly selecting these in two trials (and falsely concluding that AB=C) is 1/22 or 1/4. In the general case, the proportion of r yielding the zero vector may be less than 1/2, and a larger number of trials (such as 20) would be used, rendering the probability of error very small.
Error analysis [ edit ]
Let p equal the probability of error. We claim that if A × B = C , then p = 0, and if A × B ≠ C , then p ≤ 1/2.
Case A × B = C [ edit ]
P
→
=
A
×
(
B
r
→
)
−
C
r
→
=
(
A
×
B
)
r
→
−
C
r
→
=
(
A
×
B
−
C
)
r
→
=
0
→
{\displaystyle {\begin{aligned}{\vec {P}}&=A\times (B{\vec {r}})-C{\vec {r}}\\&=(A\times B){\vec {r}}-C{\vec {r}}\\&=(A\times B-C){\vec {r}}\\&={\vec {0}}\end{aligned}}}
This is regardless of the value of
r
→
{\displaystyle {\vec {r}}}
, since it uses only that
A
×
B
−
C
=
0
{\displaystyle A\times B-C=0}
. Hence the probability for error in this case is:
Pr
[
P
→
≠
0
]
=
0
{\displaystyle \Pr[{\vec {P}}\neq 0]=0}
Case A × B ≠ C [ edit ]
Let
D
{\displaystyle D}
such that
P
→
=
D
×
r
→
=
(
p
1
,
p
2
,
…
,
p
n
)
T
{\displaystyle {\vec {P}}=D\times {\vec {r}}=(p_{1},p_{2},\dots ,p_{n})^{T}}
Where
D
=
A
×
B
−
C
=
(
d
i
j
)
{\displaystyle D=A\times B-C=(d_{ij})}
.
Since
A
×
B
≠
C
{\displaystyle A\times B\neq C}
, we have that some element of
D
{\displaystyle D}
is nonzero. Suppose that the element
d
i
j
≠
0
{\displaystyle d_{ij}\neq 0}
. By the definition of matrix multiplication , we have:
p
i
=
∑
k
=
1
n
d
i
k
r
k
=
d
i
1
r
1
+
⋯
+
d
i
j
r
j
+
⋯
+
d
i
n
r
n
=
d
i
j
r
j
+
y
{\displaystyle p_{i}=\sum _{k=1}^{n}d_{ik}r_{k}=d_{i1}r_{1}+\cdots +d_{ij}r_{j}+\cdots +d_{in}r_{n}=d_{ij}r_{j}+y}
.
For some constant
y
{\displaystyle y}
.
Using Bayes' theorem , we can partition over
y
{\displaystyle y}
:
Pr
[
p
i
=
0
]
=
Pr
[
p
i
=
0
|
y
=
0
]
⋅
Pr
[
y
=
0
]
+
Pr
[
p
i
=
0
|
y
≠
0
]
⋅
Pr
[
y
≠
0
]
{\displaystyle \Pr[p_{i}=0]=\Pr[p_{i}=0|y=0]\cdot \Pr[y=0]\,+\,\Pr[p_{i}=0|y\neq 0]\cdot \Pr[y\neq 0]}
(1 )
We use that:
Pr
[
p
i
=
0
|
y
=
0
]
=
Pr
[
r
j
=
0
]
=
1
2
{\displaystyle \Pr[p_{i}=0|y=0]=\Pr[r_{j}=0]={\frac {1}{2}}}
Pr
[
p
i
=
0
|
y
≠
0
]
=
Pr
[
r
j
=
1
∧
d
i
j
=
−
y
]
≤
Pr
[
r
j
=
1
]
=
1
2
{\displaystyle \Pr[p_{i}=0|y\neq 0]=\Pr[r_{j}=1\land d_{ij}=-y]\leq \Pr[r_{j}=1]={\frac {1}{2}}}
Plugging these in the equation (1 ), we get:
Pr
[
p
i
=
0
]
≤
1
2
⋅
Pr
[
y
=
0
]
+
1
2
⋅
Pr
[
y
≠
0
]
=
1
2
⋅
Pr
[
y
=
0
]
+
1
2
⋅
(
1
−
Pr
[
y
=
0
]
)
=
1
2
{\displaystyle {\begin{aligned}\Pr[p_{i}=0]&\leq {\frac {1}{2}}\cdot \Pr[y=0]+{\frac {1}{2}}\cdot \Pr[y\neq 0]\\&={\frac {1}{2}}\cdot \Pr[y=0]+{\frac {1}{2}}\cdot (1-\Pr[y=0])\\&={\frac {1}{2}}\end{aligned}}}
Therefore,
Pr
[
P
→
=
0
]
=
Pr
[
p
1
=
0
∧
⋯
∧
p
i
=
0
∧
⋯
∧
p
n
=
0
]
≤
Pr
[
p
i
=
0
]
≤
1
2
.
{\displaystyle \Pr[{\vec {P}}=0]=\Pr[p_{1}=0\land \dots \land p_{i}=0\land \dots \land p_{n}=0]\leq \Pr[p_{i}=0]\leq {\frac {1}{2}}.}
This completes the proof.
Ramifications [ edit ]
Simple algorithmic analysis shows that the running time of this algorithm is
O
(
n
2
)
{\displaystyle O(n^{2})}
(in big O notation ). This beats the classical deterministic algorithm's runtime of
O
(
n
3
)
{\displaystyle O(n^{3})}
(or
O
(
n
2.373
)
{\displaystyle O(n^{2.373})}
if using fast matrix multiplication ). The error analysis also shows that if the algorithm is run
k
{\displaystyle k}
times, an error bound of less than
1
/
2
k
{\displaystyle 1/2^{k}}
can be achieved, an exponentially small quantity. The algorithm is also fast in practice due to wide availability of fast implementations for matrix-vector products. Therefore, utilization of randomized algorithms can speed up a very slow deterministic algorithm .
Freivalds' algorithm frequently arises in introductions to probabilistic algorithms because of its simplicity and how it illustrates the superiority of probabilistic algorithms in practice for some problems.
See also [ edit ]
References [ edit ]
Freivalds, R. (1977), “Probabilistic Machines Can Use Less Running Time”, IFIP Congress 1977, pp. 839–842.
Mitzenmacher, Michael ; Upfal, Eli (2005), Probability and computing: Randomized algorithms and probabilistic analysis , Cambridge University Press, pp. 8–12, ISBN 0521835402
Key concepts Problems Hardware Software