Rayleigh quotient
In mathematics, for a given complex Hermitian matrix M and nonzero vector x, the Rayleigh quotient[1] R(M,x), is defined as[2][3]:
For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the conjugate transpose x * to the usual transpose x'. Note that R(M,cx) = R(M,x) for any real scalar
. Recall that a Hermitian (or real symmetric) matrix has real eigenvalues. It can be shown that, for a given matrix, the Rayleigh quotient reaches its minimum value λmin (the smallest eigenvalue of M) when x is vmin (the corresponding eigenvector). Similarly,
and R(M,vmax ) = λmax . The Rayleigh quotient is used in min-max theorem to get exact values of all eigenvalues. It is also used in eigenvalue algorithms to obtain an eigenvalue approximation from an eigenvector approximation. Specifically, this is the basis for Rayleigh quotient iteration.
The range of the Rayleigh quotient is called a numerical range.
Contents |
[edit] Special case of covariance matrices
A covariance matrix M can be represented as the product A'A. Its eigenvalues are positive:
- Mvi = = A'Avi = λivi
- vi'A'Avi = vi'λivi
The eigenvectors are orthogonal to one another:
- Mvi = λivi
- vj'Mvi = λivj'vi
- (Mvj)'vi = λivj'vi
- λjvj'vi = λivj'vi
- (λj − λi)vj'vi = 0
- vj'vi = 0 (different eigenvalues, in case of multiplicity, the basis can be orthogonalized).
The Rayleigh quotient can be expressed as a function of the eigenvalues by decomposing any vector x on the basis of eigenvectors:
, where
is the coordinate of x orthogonally projected onto vi
which, by orthogonality of the eigenvectors, becomes:
In the last representation we can see that the Rayleigh quotient is the sum of the squared cosines of the angles formed by the vector x and each eigenvector vi, weighted by corresponding eigenvalues.
If a vector x maximizes R(M,x), then any vector k.x (for
) also maximizes it, one can reduce to the Lagrange problem of maximizing
under the constraint that
.
Since all the eigenvalues are non-negative, the problem is convex and the maximum occurs on the edges of the domain, namely when α1 = 1 and
(when the eigenvalues are ordered in decreasing magnitude).
Alternatively, this result can be arrived at by the method of Lagrange multipliers. The problem is to find the critical points of the function
- R(M,x) = xTMx,
subject to the constraint
I.e. to find the critical points of
where λ is a Lagrange multiplier. The stationary points of
occur at
and 
Therefore, the eigenvectors
of M are the critical points of the Rayleigh Quotient and their corresponding eigenvalues
are the stationary values of R.
This property is the basis for principal components analysis and canonical correlation.
[edit] Use in Sturm–Liouville theory
Sturm–Liouville theory concerns the action of the linear operator
on the inner product space defined by
of functions satisfying some specified boundary conditions at a and b. In this case the Rayleigh quotient is
This is sometimes presented in an equivalent form, obtained by separating the integral in the numerator and using integration by parts:
[edit] Generalization
For a given pair (A,B) of real symmetric positive-definite matrices, and a given non-zero vector x, the generalized Rayleigh quotient is defined as:
The Generalized Rayleigh Quotient can be reduced to the Rayleigh Quotient R(D,Cx) through the transformation
where C is the Cholesky decomposition of matrix B.
[edit] See also
[edit] References
- ^ Also known as the Rayleigh–Ritz ratio; named after Walther Ritz and Lord Rayleigh.
- ^ Horn, R. A. and C. A. Johnson. 1985. Matrix Analysis. Cambridge University Press. pp. 176–180.
- ^ Parlet B. N. The symmetric eigenvalue problem, SIAM, Classics in Applied Mathematics,1998
[edit] Further reading
- Shi Yu, Léon-Charles Tranchevent, Bart Moor, Yves Moreau, Kernel-based Data Fusion for Machine Learning: Methods and Applications in Bioinformatics and Text Mining, Ch. 2, Springer, 2011.



, where
is the coordinate of x orthogonally projected onto 






![L(y) = \frac{1}{w(x)}\left(-\frac{d}{dx}\left[p(x)\frac{dy}{dx}\right] + q(x)y\right)](http://upload.wikimedia.org/wikipedia/en/math/f/7/a/f7a279e06daa340e9ecd72639b512b93.png)

![\frac{\langle{y,Ly}\rangle}{\langle{y,y}\rangle} = \frac{\int_a^b{y(x)\left(-\frac{d}{dx}\left[p(x)\frac{dy}{dx}\right] + q(x)y(x)\right)}dx}{\int_a^b{w(x)y(x)^2}dx}.](http://upload.wikimedia.org/wikipedia/en/math/a/c/8/ac81866f4aa8cc56e243b677b251644a.png)
![\frac{\langle{y,Ly}\rangle}{\langle{y,y}\rangle} = \frac{\int_a^b{y(x)\left(-\frac{d}{dx}\left[p(x)y'(x)\right]\right)}dx + \int_a^b{q(x)y(x)^2} \, dx}{\int_a^b{w(x)y(x)^2} \, dx}](http://upload.wikimedia.org/wikipedia/en/math/d/8/5/d8551a06bb0015090df841ba90aa66f5.png)
![= \frac{-y(x)\left[p(x)y'(x)\right]|_a^b + \int_a^b{y'(x)\left[p(x)y'(x)\right]} \, dx + \int_a^b{q(x)y(x)^2} \, dx}{\int_a^b{w(x)y(x)^2} \, dx}](http://upload.wikimedia.org/wikipedia/en/math/e/8/7/e87a1494fc078fcd6c7f6b70047ef17b.png)
![= \frac{-p(x)y(x)y'(x)|_a^b + \int_a^b\left[p(x)y'(x)^2 + q(x)y(x)^2\right] \, dx}{\int_a^b{w(x)y(x)^2} \, dx}.](http://upload.wikimedia.org/wikipedia/en/math/7/1/2/7128e347e30563e26611a9b199108f2c.png)
