Power iteration

(Redirected from Power method)

In mathematics, power iteration (also known as the power method) is an eigenvalue algorithm: given a diagonalizable matrix ${\displaystyle A}$, the algorithm will produce a number ${\displaystyle \lambda }$, which is the greatest (in absolute value) eigenvalue of ${\displaystyle A}$, and a nonzero vector ${\displaystyle v}$, the corresponding eigenvector of ${\displaystyle \lambda }$, such that ${\displaystyle Av=\lambda v}$. The algorithm is also known as the Von Mises iteration.[1]

Power iteration is a very simple algorithm, but it may converge slowly. It does not compute a matrix decomposition, and hence it can be used when ${\displaystyle A}$ is a very large sparse matrix.[clarification needed]

The method

The power iteration algorithm starts with a vector ${\displaystyle b_{0}}$, which may be an approximation to the dominant eigenvector or a random vector. The method is described by the recurrence relation

${\displaystyle b_{k+1}={\frac {Ab_{k}}{\|Ab_{k}\|}}}$

So, at every iteration, the vector ${\displaystyle b_{k}}$ is multiplied by the matrix ${\displaystyle A}$ and normalized.

If we assume ${\displaystyle A}$ has an eigenvalue that is strictly greater in magnitude than its other eigenvalues and the starting vector ${\displaystyle b_{0}}$ has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue, then a subsequence ${\displaystyle \left(b_{k}\right)}$ converges to an eigenvector associated with the dominant eigenvalue.

Without the two assumptions above, the sequence ${\displaystyle \left(b_{k}\right)}$ does not necessarily converge. In this sequence,

${\displaystyle b_{k}=e^{i\phi _{k}}v_{1}+r_{k}}$,

where ${\displaystyle v_{1}}$ is an eigenvector associated with the dominant eigenvalue, and ${\displaystyle \|r_{k}\|\rightarrow 0}$. The presence of the term ${\displaystyle e^{i\phi _{k}}}$ implies that ${\displaystyle \left(b_{k}\right)}$ does not converge unless ${\displaystyle e^{i\phi _{k}}=1}$. Under the two assumptions listed above, the sequence ${\displaystyle \left(\mu _{k}\right)}$ defined by

${\displaystyle \mu _{k}={\frac {b_{k}^{*}Ab_{k}}{b_{k}^{*}b_{k}}}}$

converges to the dominant eigenvalue.[clarification needed]

One may compute this with the following algorithm (shown in Python with NumPy):

#!/usr/bin/python

import numpy as np

def power_iteration(A, num_simulations):
# Ideally choose a random vector
# To decrease the chance that our vector
# Is orthogonal to the eigenvector
b_k = np.random.rand(A.shape[1])

for _ in range(num_simulations):
# calculate the matrix-by-vector product Ab
b_k1 = np.dot(A, b_k)

# calculate the norm
b_k1_norm = np.linalg.norm(b_k1)

# re normalize the vector
b_k = b_k1 / b_k1_norm

return b_k

power_iteration(np.array([[0.5, 0.5], [0.2, 0.8]]), 10)


The vector ${\displaystyle b_{k}}$ to an associated eigenvector. Ideally, one should use the Rayleigh quotient in order to get the associated eigenvalue.

This algorithm is the one used to calculate such things as the Google PageRank.

The method can also be used to calculate the spectral radius (the largest eigenvalue of a matrix) by computing the Rayleigh quotient

${\displaystyle {\frac {b_{k}^{\top }Ab_{k}}{b_{k}^{\top }b_{k}}}={\frac {b_{k+1}^{\top }b_{k}}{b_{k}^{\top }b_{k}}}.}$

Analysis

Let ${\displaystyle A}$ be decomposed into its Jordan canonical form: ${\displaystyle A=VJV^{-1}}$, where the first column of ${\displaystyle V}$ is an eigenvector of ${\displaystyle A}$ corresponding to the dominant eigenvalue ${\displaystyle \lambda _{1}}$. Since the dominant eigenvalue of ${\displaystyle A}$ is unique, the first Jordan block of ${\displaystyle J}$ is the ${\displaystyle 1\times 1}$ matrix ${\displaystyle {\begin{bmatrix}\lambda _{1}\end{bmatrix}}}$, where ${\displaystyle \lambda _{1}}$ is the largest eigenvalue of A in magnitude. The starting vector ${\displaystyle b_{0}}$ can be written as a linear combination of the columns of V: ${\displaystyle b_{0}=c_{1}v_{1}+c_{2}v_{2}+\cdots +c_{n}v_{n}}$. By assumption, ${\displaystyle b_{0}}$ has a nonzero component in the direction of the dominant eigenvalue, so ${\displaystyle c_{1}\neq 0}$.

The computationally useful recurrence relation for ${\displaystyle b_{k+1}}$ can be rewritten as: ${\displaystyle b_{k+1}={\frac {Ab_{k}}{\|Ab_{k}\|}}={\frac {A^{k+1}b_{0}}{\|A^{k+1}b_{0}\|}}}$, where the expression: ${\displaystyle {\frac {A^{k+1}b_{0}}{\|A^{k+1}b_{0}\|}}}$ is more amenable to the following analysis.
${\displaystyle \displaystyle {\begin{array}{lcl}b_{k}&=&{\frac {A^{k}b_{0}}{\|A^{k}b_{0}\|}}\\&=&{\frac {\left(VJV^{-1}\right)^{k}b_{0}}{\|\left(VJV^{-1}\right)^{k}b_{0}\|}}\\&=&{\frac {VJ^{k}V^{-1}b_{0}}{\|VJ^{k}V^{-1}b_{0}\|}}\\&=&{\frac {VJ^{k}V^{-1}\left(c_{1}v_{1}+c_{2}v_{2}+\cdots +c_{n}v_{n}\right)}{\|VJ^{k}V^{-1}\left(c_{1}v_{1}+c_{2}v_{2}+\cdots +c_{n}v_{n}\right)\|}}\\&=&{\frac {VJ^{k}\left(c_{1}e_{1}+c_{2}e_{2}+\cdots +c_{n}e_{n}\right)}{\|VJ^{k}\left(c_{1}e_{1}+c_{2}e_{2}+\cdots +c_{n}e_{n}\right)\|}}\\&=&\left({\frac {\lambda _{1}}{|\lambda _{1}|}}\right)^{k}{\frac {c_{1}}{|c_{1}|}}{\frac {v_{1}+{\frac {1}{c_{1}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{k}\left(c_{2}e_{2}+\cdots +c_{n}e_{n}\right)}{\|v_{1}+{\frac {1}{c_{1}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{k}\left(c_{2}e_{2}+\cdots +c_{n}e_{n}\right)\|}}\end{array}}}$
The expression above simplifies as ${\displaystyle k\rightarrow \infty }$
${\displaystyle \left({\frac {1}{\lambda _{1}}}J\right)^{k}={\begin{bmatrix}[1]&&&&\\&\left({\frac {1}{\lambda _{1}}}J_{2}\right)^{k}&&&\\&&\ddots &\\&&&\left({\frac {1}{\lambda _{1}}}J_{m}\right)^{k}\\\end{bmatrix}}\rightarrow {\begin{bmatrix}1&&&&\\&0&&&\\&&\ddots &\\&&&0\\\end{bmatrix}}}$ as ${\displaystyle k\rightarrow \infty }$.
The limit follows from the fact that the eigenvalue of ${\displaystyle {\frac {1}{\lambda _{1}}}J_{i}}$ is less than 1 in magnitude, so ${\displaystyle \left({\frac {1}{\lambda _{1}}}J_{i}\right)^{k}\rightarrow 0}$ as ${\displaystyle k\rightarrow \infty }$
It follows that:
${\displaystyle {\frac {1}{c_{1}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{k}\left(c_{2}e_{2}+\cdots +c_{n}e_{n}\right)\rightarrow 0}$ as ${\displaystyle k\rightarrow \infty }$
Using this fact, ${\displaystyle b_{k}}$ can be written in a form that emphasizes its relationship with ${\displaystyle v_{1}}$ when k is large:
${\displaystyle {\begin{matrix}b_{k}&=&\left({\frac {\lambda _{1}}{|\lambda _{1}|}}\right)^{k}{\frac {c_{1}}{|c_{1}|}}{\frac {v_{1}+{\frac {1}{c_{1}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{k}\left(c_{2}e_{2}+\cdots +c_{n}e_{n}\right)}{\|v_{1}+{\frac {1}{c_{1}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{k}\left(c_{2}e_{2}+\cdots +c_{n}e_{n}\right)\|}}&=&e^{i\phi _{k}}{\frac {c_{1}}{|c_{1}|}}{\frac {v_{1}}{\|v_{1}\|}}+r_{k}\end{matrix}}}$ where ${\displaystyle e^{i\phi _{k}}=\left(\lambda _{1}/|\lambda _{1}|\right)^{k}}$ and ${\displaystyle \|r_{k}\|\rightarrow 0}$ as ${\displaystyle k\rightarrow \infty }$
The sequence ${\displaystyle \left(b_{k}\right)}$ is bounded, so it contains a convergent subsequence. Note that the eigenvector corresponding to the dominant eigenvalue is only unique up to a scalar, so although the sequence ${\displaystyle \left(b_{k}\right)}$ may not converge, ${\displaystyle b_{k}}$ is nearly an eigenvector of A for large k.

Alternatively, if A is diagonalizable, then the following proof yields the same result
Let λ1, λ2, …, λm be the m eigenvalues (counted with multiplicity) of A and let v1, v2, …, vm be the corresponding eigenvectors. Suppose that ${\displaystyle \lambda _{1}}$ is the dominant eigenvalue, so that ${\displaystyle |\lambda _{1}|>|\lambda _{j}|}$ for ${\displaystyle j>1}$.

The initial vector ${\displaystyle b_{0}}$ can be written:

${\displaystyle b_{0}=c_{1}v_{1}+c_{2}v_{2}+\cdots +c_{m}v_{m}.}$

If ${\displaystyle b_{0}}$ is chosen randomly (with uniform probability), then c1 ≠ 0 with probability 1. Now,

${\displaystyle {\begin{array}{lcl}A^{k}b_{0}&=&c_{1}A^{k}v_{1}+c_{2}A^{k}v_{2}+\cdots +c_{m}A^{k}v_{m}\\&=&c_{1}\lambda _{1}^{k}v_{1}+c_{2}\lambda _{2}^{k}v_{2}+\cdots +c_{m}\lambda _{m}^{k}v_{m}\\&=&c_{1}\lambda _{1}^{k}\left(v_{1}+{\frac {c_{2}}{c_{1}}}\left({\frac {\lambda _{2}}{\lambda _{1}}}\right)^{k}v_{2}+\cdots +{\frac {c_{m}}{c_{1}}}\left({\frac {\lambda _{m}}{\lambda _{1}}}\right)^{k}v_{m}\right).\end{array}}}$

The expression within parentheses converges to ${\displaystyle v_{1}}$ because ${\displaystyle |\lambda _{j}/\lambda _{1}|<1}$ for ${\displaystyle j>1}$. On the other hand, we have

${\displaystyle b_{k}={\frac {A^{k}b_{0}}{\|A^{k}b_{0}\|}}.}$

Therefore, ${\displaystyle b_{k}}$ converges to (a multiple of) the eigenvector ${\displaystyle v_{1}}$. The convergence is geometric, with ratio

${\displaystyle \left|{\frac {\lambda _{2}}{\lambda _{1}}}\right|,}$

where ${\displaystyle \lambda _{2}}$ denotes the second dominant eigenvalue. Thus, the method converges slowly if there is an eigenvalue close in magnitude to the dominant eigenvalue.

Applications

Although the power iteration method approximates only one eigenvalue of a matrix, it remains useful for certain computational problems. For instance, Google uses it to calculate the PageRank of documents in their search engine,[2] and Twitter uses it to show users recommendations of who to follow.[3] The power iteration method is especially suitable for sparse matrices, such as the web matrix, or as the matrix-free method that does not require storing the coefficient matrix ${\displaystyle A}$ explicitly, but can instead access a function evaluating matrix-vector products ${\displaystyle Ax}$. For non-symmetric matrices that are well-conditioned the power iteration method can outperform more complex Arnoldi iteration. For symmetric matrices, the power iteration method is rarely used, since its convergence speed can be easily increased without sacrificing the small cost per iteration; see, e.g., Lanczos iteration and LOBPCG.

Some of the more advanced eigenvalue algorithms can be understood as variations of the power iteration. For instance, the inverse iteration method applies power iteration to the matrix ${\displaystyle A^{-1}}$. Other algorithms look at the whole subspace generated by the vectors ${\displaystyle b_{k}}$. This subspace is known as the Krylov subspace. It can be computed by Arnoldi iteration or Lanczos iteration.