Rayleigh quotient

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In mathematics, for a given complex Hermitian matrix M and nonzero vector x, the Rayleigh quotient[1] R(M, x), is defined as:[2][3]

R(M,x) := {x^{*} M x \over x^{*} x}.

For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the conjugate transpose x^{*} to the usual transpose x'. Note that R(M, c x) = R(M,x) for any non-zero real scalar c. Recall that a Hermitian (or real symmetric) matrix has real eigenvalues. It can be shown that, for a given matrix, the Rayleigh quotient reaches its minimum value \lambda_\min (the smallest eigenvalue of M) when x is v_\min (the corresponding eigenvector). Similarly, R(M, x) \leq \lambda_\max and R(M, v_\max) = \lambda_\max.

The Rayleigh quotient is used in the min-max theorem to get exact values of all eigenvalues. It is also used in eigenvalue algorithms to obtain an eigenvalue approximation from an eigenvector approximation. Specifically, this is the basis for Rayleigh quotient iteration.

The range of the Rayleigh quotient (for matrix that is not necessarily Hermitian) is called a numerical range, (or spectrum in functional analysis). When the matrix is Hermitian, the numerical range is equal to the spectral norm. Still in functional analysis, \lambda_\max is known as the spectral radius. In the context of C*-algebras or algebraic quantum mechanics, the function that to M associates the Rayleigh-Ritz quotient R(M,x) for a fixed x and M varying through the algebra would be referred to as "vector state" of the algebra.

Bounds[edit]

As stated in introduction R(M,x) \in \left[\lambda_\min, \lambda_\max \right]. This is immediate after observing that the Rayleigh quotient is a weighted average of eigenvalues of M:

R(M,x) = \frac{\sum_{i=1}^n \lambda_i x_i^2}{\sum_{i=1}^n x_i^2}

where (\lambda_i, v_i) is the ith eigenpair after orthonormalization and x_i = v_i^* x is the ith coordinate of x in the eigenbasis. It is easy to verify that the bounds are attained at the corresponding eigenvectors v_\min, v_\max.

Special case of covariance matrices[edit]

An empirical covariance matrix M can be represented as the product A' A of the data matrix A pre-multiplied by its transpose A'. Being a positive semi-definite matrix, M has non-negative eigenvalues, and orthogonal (or othogonalisable) eigenvectors, which can be demonstrated as follows.

Firstly, that the eigenvalues \lambda_i are non-negative:

M v_i = A' A v_i = \lambda_i v_i
\Rightarrow v_i' A' A v_i = v_i' \lambda_i v_i
\Rightarrow \left\| A v_i \right\|^2 = \lambda_i \left\| v_i \right\|^2
\Rightarrow \lambda_i = \frac{\left\| A v_i \right\|^2}{\left\| v_i \right\|^2} \geq 0.

Secondly, that the eigenvectors vi are orthogonal to one another:

\begin{align}
&\qquad \qquad M v_i = \lambda _i v_i \\
&\Rightarrow v_j' M v_i = \lambda _i v_j' v_i \\
&\Rightarrow \left (M v_j \right )' v_i = \lambda _j v_j' v_i \\
&\Rightarrow \lambda_j v_j ' v_i = \lambda _i v_j' v_i \\
&\Rightarrow \left (\lambda_j - \lambda_i \right ) v_j ' v_i = 0 \\
&\Rightarrow v_j ' v_i = 0
\end{align}

If the eigenvalues are different – in the case of multiplicity, the basis can be orthogonalized.

To now establish that the Rayleigh quotient is maximised by the eigenvector with the largest eigenvalue, consider decomposing an arbitrary vector x on the basis of the eigenvectors vi:

x = \sum _{i=1} ^n \alpha _i v_i,

where

\alpha_i = \frac{x'v_i}{v_i'v_i} = \frac{\langle x,v_i\rangle}{\left\| v_i \right\| ^2}

is the coordinate of x orthogonally projected onto vi. Therefore we have:

R(M,x) = \frac{x' A' A x}{x' x} = \frac{ \left (\sum _{j=1} ^n \alpha _j v_j \right )' \left ( A' A \right ) \left (\sum _{i=1} ^n \alpha _i v_i \right )}{ \left (\sum _{j=1} ^n \alpha _j v_j \right )' \left (\sum _{i=1} ^n \alpha _i v_i \right )}

which, by orthogonality of the eigenvectors, becomes:

R(M,x) = \frac{\sum _{i=1} ^n \alpha_i^2 \lambda _i}{\sum _{i=1} ^n \alpha_i^2} = \sum_{i=1}^n \lambda_i \frac{(x'v_i)^2}{ (x'x)( v_i' v_i)}

The last representation establishes that the Rayleigh quotient is the sum of the squared cosines of the angles formed by the vector x and each eigenvector vi, weighted by corresponding eigenvalues.

If a vector x maximizes R(M,x), then any non-zero scalar multiple kx also maximizes R, so the problem can be reduced to the Lagrange problem of maximizing \sum _{i=1}^n \alpha_i^2 \lambda _i under the constraint that \sum _{i=1} ^n \alpha _i ^2 = 1.

Define: βi = α2
i
. This then becomes a linear program, which always attains its maximum at one of the corners of the domain. A maximum point will have \alpha_1 = \pm 1 and \alpha _i = 0 for all i > 1 (when the eigenvalues are ordered by decreasing magnitude).

Thus, as advertised, the Rayleigh quotient is maximised by the eigenvector with the largest eigenvalue.

Formulation using Lagrange multipliers[edit]

Alternatively, this result can be arrived at by the method of Lagrange multipliers. The problem is to find the critical points of the function

R(M,x) = x^T M x ,

subject to the constraint \|x\|^2 = x^Tx = 1. I.e. to find the critical points of

\mathcal{L}(x) = x^T M x  -\lambda \left (x^Tx - 1 \right),

where λ is a Lagrange multiplier. The stationary points of \mathcal{L}(x) occur at

\frac{d\mathcal{L}(x)}{dx} = 0
\therefore 2x^T M^T  - 2\lambda x^T = 0
\therefore M x = \lambda x

and

 R(M,x) = \frac{x^T M x}{x^T x} = \lambda \frac{x^Tx}{x^T x} = \lambda.

Therefore, the eigenvectors x_1, \cdots, x_n of M are the critical points of the Rayleigh Quotient and their corresponding eigenvalues \lambda_1, \cdots, \lambda_n are the stationary values of R.

This property is the basis for principal components analysis and canonical correlation.

Use in Sturm–Liouville theory[edit]

Sturm–Liouville theory concerns the action of the linear operator

L(y) = \frac{1}{w(x)}\left(-\frac{d}{dx}\left[p(x)\frac{dy}{dx}\right] + q(x)y\right)

on the inner product space defined by

\langle{y_1,y_2}\rangle = \int_a^b w(x)y_1(x)y_2(x) \, dx

of functions satisfying some specified boundary conditions at a and b. In this case the Rayleigh quotient is

\frac{\langle{y,Ly}\rangle}{\langle{y,y}\rangle} = \frac{\int_a^b y(x)\left(-\frac{d}{dx}\left[p(x)\frac{dy}{dx}\right] + q(x)y(x)\right)dx}{\int_a^b{w(x)y(x)^2}dx}.

This is sometimes presented in an equivalent form, obtained by separating the integral in the numerator and using integration by parts:

\begin{align}
\frac{\langle{y,Ly}\rangle}{\langle{y,y}\rangle} &= \frac{ \left \{ \int_a^b y(x)\left(-\frac{d}{dx}\left[p(x)y'(x)\right]\right) dx \right \}+ \left \{\int_a^b{q(x)y(x)^2} \, dx \right \}}{\int_a^b{w(x)y(x)^2} \, dx} \\
&= \frac{ \left \{\left. -y(x)\left[p(x)y'(x)\right] \right |_a^b \right \} + \left \{\int_a^b y'(x)\left[p(x)y'(x)\right] \, dx \right \} + \left \{\int_a^b{q(x)y(x)^2} \, dx \right \}}{\int_a^b w(x)y(x)^2 \, dx}\\
&= \frac{ \left \{ \left. -p(x)y(x)y'(x) \right |_a^b \right \} + \left \{ \int_a^b \left [p(x)y'(x)^2 + q(x)y(x)^2 \right] \, dx \right \} } {\int_a^b{w(x)y(x)^2} \, dx}.
\end{align}

Generalization[edit]

For a given pair (A, B) of matrices, and a given non-zero vector x, the generalized Rayleigh quotient is defined as:

R(A,B; x) := \frac{x^* A x}{x^* B x}.

The Generalized Rayleigh Quotient can be reduced to the Rayleigh Quotient R(D, C^*x) through the transformation D = C^{-1} A {C^*}^{-1} where CC^* is the Cholesky decomposition of the Hermitian positive-definite matrix B.

See also[edit]

References[edit]

  1. ^ Also known as the Rayleigh–Ritz ratio; named after Walther Ritz and Lord Rayleigh.
  2. ^ Horn, R. A. and C. A. Johnson. 1985. Matrix Analysis. Cambridge University Press. pp. 176–180.
  3. ^ Parlet B. N. The symmetric eigenvalue problem, SIAM, Classics in Applied Mathematics,1998

Further reading[edit]