Cayley–Hamilton theorem

In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Hamilton) states that every square matrix over the real or complex field satisfies its own characteristic equation.

More precisely; if A is the given square n×n matrix and I_n is the n×n identity matrix, then the characteristic polynomial of A is defined as:

p(\lambda )=\det(\lambda I_{n}-A)\,

where "det" is the determinant function. The Cayley–Hamilton theorem states that substituting the matrix $A$ in the characteristic polynomial (which involves multiplying its constant term by I_n, since that is the "zero-th" power of A) results in the zero matrix:

p(A)=0.\,

The Cayley–Hamilton theorem also holds for square matrices over commutative rings and for N-partitioned matrices.^[1]

An important corollary of the Cayley–Hamilton theorem is that the minimal polynomial of a given matrix is a divisor of its characteristic polynomial. This is very useful in finding the Jordan form of a matrix.

Example

For two dimensional matrices, the theorem gives the identity

A^{2}-{\mbox{tr}}(A)A+\det(A)I_{2}=0\,.

In three dimensions, the expression is

A^{3}-{\mbox{tr}}(A)A^{2}+{\frac {1}{2}}{\big (}{\mbox{tr}}(A)^{2}-{\mbox{tr}}(A^{2}){\big )}A-\det(A)I_{3}=0

and in four

A^{4}-{\mbox{tr}}(A)A^{3}+{\frac {1}{2}}{\big (}{\mbox{tr}}(A)^{2}-{\mbox{tr}}(A^{2}){\big )}A^{2}-{\frac {1}{6}}{\big (}{\mbox{tr}}(A)^{3}-3{\mbox{tr}}(A^{2}){\mbox{tr}}(A)+2{\mbox{tr}}(A^{3}){\big )}A+\det(A)I_{4}=0

Similar expressions can be obtained for the higher dimensional cases.

Consider for example the matrix

A={\begin{bmatrix}1&2\\3&4\end{bmatrix}}.

The characteristic polynomial is given by

p(\lambda )={\begin{vmatrix}\lambda -1&-2\\-3&\lambda -4\end{vmatrix}}=(\lambda -1)(\lambda -4)-2\cdot 3=\lambda ^{2}-5\lambda -2.

The Cayley–Hamilton theorem then claims that

A^{2}-5A-2I_{2}=0\,

which one can quickly verify in this case.

As a result of this, the Cayley–Hamilton theorem allows us to calculate powers of matrices more simply than by direct multiplication.

Taking the result above

A^{2}-5A-2I_{2}=0\,

A^{2}=5A+2I_{2}.\,

Then, for example, to calculate A⁴, observe

A^{3}=(5A+2I_{2})A=5A^{2}+2A=5(5A+2I_{2})+2A=27A+10I_{2}\,

A^{4}=A^{3}A=(27A+10I_{2})A=27A^{2}+10A=27(5A+2I_{2})+10A\,

A^{4}=145A+54I_{2}.\,

The theorem is also an important tool in calculating eigenvectors.

Note that for a general n x n matrix A, when its inverse exists, i.e. $\det(A)\neq 0$ , $A^{-1}$ can be written as a finite sum of powers of A: the form of the characteristic polynomial is

p(\lambda )=\lambda ^{n}-{\rm {tr}}(A)\lambda ^{n-1}+\cdots +(-1)^{n}\det(A),

and multiplying both sides of $p(A)=0$ by $A^{-1}$ gives

A^{-1}={\frac {(-1)^{n-1}}{\det(A)}}(A^{n-1}-{\rm {tr}}(A)A^{n-2}+\cdots ).

Proving the Cayley–Hamilton theorem

Erroneous arguments

The biggest difficulty about the Cayley–Hamilton theorem is that there are many wrong ways to prove it; it is crucial to understand what is wrong about them before attempting a correct proof. In most cases errors involve an application of arguments that might be valid in a commutative context to the non-commutative setting of matrix arithmetic. The simplest erroneous argument is even more crude however: it consists of taking the defining equation $p(\lambda )=\det(\lambda I_{n}-A)$ of the characteristic polynomial, and to simply substitute $A$ for $\lambda$ , giving $p(A)=\det(AI_{n}-A)=\det(0)=0$ . Here the error is of course that on the right hand side the substitution makes no sense: $\lambda I_{n}$ is an $n\times n$ matrix with $n$ scalars $\lambda$ on the diagonal, so substituting $A$ for $\lambda$ would make $AI_{n}$ a matrix with $n$ matrices $A$ on the diagonal, and whatever meaning (if any) is given to that, it is certainly not equal to the matrix $A$ as the argument suggests (which in fact silently replaces the scalar multiplication in $\lambda I_{n}$ by a matrix multiplication after the substitution). Note by the way that the conclusion of this argument equates $p(A)$ to the scalar 0 rather than to the zero matrix as it should.

It may be noted that the very statement of the theorem does something somewhat similar, by taking the matrix $A$ and plugging it into the polynomial $p$ with scalar coefficients; in this case however this is a precisely defined operation that replaces monomials $t^{i}$ by $A^{i}$ and any constant term $c$ by $cI_{n}$ , and evaluates the resulting linear combination of matrices in the vector space of $n\times n$ matrices (or more abstractly formulated, it takes the image of $p(t)$ under the unique ring homomorphism $k[t]\to M_{n}(k)$ that sends 1 to $I_{n}$ and $t$ to $A$ ).

A somewhat more involved but still erroneous argument starts with a statement that is indeed a key ingredient for a correct proof, namely that for any $n\times n$ matrix $S$ there exists an explicitly determined matrix $Adj(S)$ , its adjugate matrix, for which $Adj(S)S=SAdj(S)=\det(S)I_{n}$ holds (this is a form of Cramer's rule from linear algebra). Since this is true for matrices with entries in any commutative ring, we can apply it to the matrix $tI_{n}-A$ , which has entries in the polynomial ring $k[t]$ . This gives

(tI_{n}-A)Adj(tI_{n}-A)=\det(tI_{n}-A)I_{n}=p(t)I_{n}\quad (*)

Now as in the first argument given, we may be tempted to substitute $A$ for $t$ , and observe that the first factor, and therefore the left hand side, becomes the null matrix, and the right hand side becomes $p(A)I_{n}=p(A)$ . Apart from the fact that we have now confused scalar and matrix multiplication on both sides of the equation, so that the result at least has matrices at both sides, the substitution of $A$ for $t$ in $tI_{n}-A$ still makes as little sense as in the original argument.

A more sophisticated argument continues from (*), remarking that it can be viewed as an identity between polynomials (of degree $n$ ) in $t$ with matrices as coefficients, in which setting the substitution of $A$ for $t$ (still reinterpreting the scalar multiplications in terms of the form $Bt^{i}$ as matrix multiplications) would be justified. Considering polynomials over the non-commutative ring $M_{n}(k)$ may seem strange, but it arises naturally here when writing a matrix with entries polynomial in $t$ as a sum of terms, each consisting of the scalar multiplication of a constant matrix by a power of $t$ , which is certainly possible. That point of view also dictates how such polynomials should be multiplied, the basic case being $Bt^{i}Ct^{j}=BCt^{i+j}$ for any constant matrices $B$ , $C$ . However a fundamental difference of such polynomials over a non-commutative ring $M$ with the case of polynomials over a commutative ring, is that for an element $B\in M$ (a constant matrix in our case) there is no evaluation homomorphism from the ring $M[t]$ to the ring $M$ that sends $t$ to $B$ (unless $B$ lies in the center of $M$ ). This is because the multiplication in $M[t]$ implies that $t$ commutes with all elements of $M$ , and there is no way to define substitution of a non-central element for $t$ that respects this (i.e., gives a ring homomorphism). Therefore the argument involving substitution of $A$ for $t$ is erroneous.

The following example illustrates what is going on in a concrete case. Take

A={\begin{pmatrix}0&1\\0&0\end{pmatrix}}

,

whose characteristic polynomial is easily seen to be $p(t)=t^{2}$ , while the adjugate matrix of $tI_{2}-A$ is $tI_{2}+A$ ; indeed we get as instance of (*) the equation

(tI_{2}-A)(tI_{2}+A)=t^{2}I_{2}

which is valid for any scalar $t$ (since $tI_{2}A=AtI_{2}$ and $A^{2}=0$ ). The given argument now claims that it is legitimate to substitute a matrix for $t$ into this equation, interpreting $tI_{2}$ and $t^{2}I_{2}$ as matrix products. Of course one cannot refute this claim by setting $t=A$ (after all the Cayley–Hamilton theorem is true), but by substituting some other matrix $B$ for $t$ instead, we would get $(B-A)(B+A)=B^{2}$ or after simplification $BA-AB=0$ , which is false in general (take for instance for $B$ the transpose matrix of $A$ ). The upshot is that there is no way to justify any argument in which scalar multiplications are at some point simply replaced by matrix multiplications.

Here is an even more abstract approach, which nevertheless is still erroneous.

Let k[t] be the polynomial ring over k, where k denotes the real or complex numbers, and let k[t]ⁿ be the set of length n column vectors whose entries are elements of k[t]. This is a k[t]-module: an element of k[t]ⁿ can be multiplied by a polynomial in k[t] to yield an element of k[t]ⁿ.

The matrix A provides a homomorphism from k[t] to the ring of n×n matrices by evaluating at t = A. (For example, if $f(t)=3t^{2}+t+1$ is in k[t], then the evaluation of f at t = A is $f(A)=3A^{2}+A+I_{n}$ .)

There is also an evaluation map ev_A at t = A from k[t]ⁿ to kⁿ: if $v(t)=\sum t^{j}v_{j}$ for column vectors v_j, then ev_A (v(t)) is defined to be $\sum A^{j}v_{j}$ .

(Erroneous) Proof of Cayley–Hamilton theorem. Apply both sides of identity (*) to an arbitrary column vector v in kⁿ to obtain the identity

(t I_n − A) Adj(t I_n − A)v = p(t)v,

which is an equation in k[t]ⁿ since applying a matrix with polynomial entries to a vector with scalar entries gives a vector with polynomial entries. Now apply the evaluation map ev_A to both sides. Notice that the left hand side evaluates to

ev_A((t I_n − A) Adj(t I_n − A) v) = (A I_n − A) ev_A(Adj(t I_n − A) v) = 0.

On the other hand, the right hand side evaluates to p(A)v. Thus p(A)v = 0 for every column vector v, and therefore p(A) = 0, QED.

Notice where this argument succumbs to the temptation of replacing the scalar product $tI_{n}$ by a matrix product $AI_{n}$ after substitution. Note also that nothing was proved about the value of $ev_{A}(M(t)v)$ or $ev_{A}(M(t)N(t)v)$ where $M(t)$ and $N(t)$ are matrices with polynomial coefficients, but some such rule was used in the computation. Confusingly, it is actually possible to define a second map $ev_{A}:M_{n}(k[t])\to M_{n}(k)$ that sends $tI_{n}-A$ to the null matrix and such that for every $v\in k^{n}$ one has $ev_{A}(M(t)\cdot v)=ev_{A}(M(t))\cdot v$ , namely by $ev_{A}(\sum _{i}M_{i}t^{i})=\sum _{i}A^{i}\cdot M_{i}$ (note the order in the right hand side, where products are matrix multiplications). But calling this second map $ev_{A}$ evaluation is misleading (as it is for the first one), since (as explained before) it is not a homomorphism of (non-commutative) rings. And similarly is it not true that $ev_{A}(M(t)\cdot v)=ev_{A}(M(t))\cdot ev_{A}(v)$ when $v\in k[t]^{n}$ . Therefore the "improved" argument $ev_{A}((tI_{n}-A)Adj(tI_{n}-A)\cdot v)=ev_{A}(tI_{n}-A)\cdot ev_{A}(Adj(tI_{n}-A)v)=0\cdot ev_{A}(Adj(tI_{n}-A)v)=0$ still fails.

A proof

Given that all attempts at a slick proof fail, one must do some real work somewhere. We start again with (*), and write

Adj(tI_{n}-A)=\sum _{i=0}^{n-1}B_{i}t^{i}

with $B_{i}\in M_{n}(k)$ , and also

p(t)=\sum _{i=0}^{n}p_{i}t^{i}

with $p_{i}\in k$ . The left hand side

(tI_{n}-A)\left(\sum _{i=0}^{n-1}B_{i}t^{i}\right)

can now be expanded by bilinearity of matrix multiplication, terms with a fixed power $t^{i}$ collected, and its (matrix) coefficient equated to the matrix $p_{i}I_{n}$ on the right. The left hand side of equation of the coefficients of $t^{i}$ then is $B_{n-1}$ for $i=n$ , it is $B_{i-1}-A\cdot B_{i}$ for $n>i>0$ and it is $-A\cdot B_{0}$ for $i=0$ . Multiplying the equation of the coefficients of $t^{i}$ from the left by $A^{i}$ and adding up all equations so formed, the left hand sides cancel out completely, and the right hand sides add up to $\sum _{i=0}^{n}p_{i}A^{i}=p(A)$ which proves the theorem.

Some further observations about the proof

Although the given proof completely avoids the dubious practice of subtituting a matrix for $t$ in a non-commutative context, its manipulations can be interpreted as doing something rather close to just that, namely splitting the equation into components according to powers of $t$ , left-multiplying each component by the same power of $A$ , and then adding everything back together again. In fact the reasoning that the left hand sides cancel can be interpreted as a proof that the operation $ev_{A}:M_{n}(k[t])\to M_{n}(k)$ described above, which we shall now call left-evaluation at $A$ to stress the fact that multiplication by $t$ gets interpreted as left-multiplication by $A$ , sends the left hand side of (*) to the null matrix. It does so by first working out the matrix product, thereby not relying on the (false) assumption that $ev_{A}$ is a ring homomorphism, but yet it does not use any particular property of the second factor $Adj(tI_{n}-A)$ of the product, indicating that something more general could be said about this.

In the setting of a ring $M[t]$ of polynomials in a commuting indeterminate $t$ over a non-commutative base ring $M$ , there is a well defined notion of Euclidian left-division by a monic polynomial $B\in M[t]$ : for every $P\in M[t]$ there exist $Q,R\in M[t]$ such that $P=BQ+R$ and $\deg(R)<\deg(B)$ , and the pair $(Q,R)$ is unique. As in the commutative case this is proved by induction on $\deg(P)$ , with the coefficients of $Q$ being determined by decreasing degree (if $\deg(P)\geq \deg(B)$ the leading coefficient of $Q$ is the same as that of $P$ , and the remaining coefficients are determined by induction). When moreover $\deg(B)=1$ , so that it is of the form $B=t-A$ with $A\in M$ , the remainder $R$ is a constant polynomial; in the commutative case it can be interpreted as the image of $P$ under the homomorphism of evaluation at $t=A$ , and in the non-commutative case it is still equal to the (non-homomorphic) left-evaluation $ev_{A}(P)$ . To see this, one may expand $(t-A)Q$ and apply the definition of $ev_{A}$ to show that it annihilates that value, basically as is done in the given proof of the theorem; or one may prefer to show first that, although in general one does not have $ev_{A}(ST)=ev_{A}(S)ev_{A}(T)$ for $S,T\in M[t]$ , this equation does hold whenever $A$ commutes with (just) $S$ , as is the case for $S=t-A$ . (There is also an operation of Euclidean right division, and the remainder of right-division by $t-A$ is the similarly defined but generally different right-evaluation at $t=A$ .)

These considerations do not really simplify the proof given, but allow us to make the following general statements that easily imply the theorem: $P\in M[t]$ is left-divisble by $t-A$ if and only if $ev_{A}(P)=0$ , and if in a product $ST$ the left factor $S$ is annihilated by $ev_{A}$ , then so is the product itself (since $S$ and therefore $ST$ are left-divisble by $t-A$ ). While not justifying the final erroneous proof above, this does show that it is not extremely far from something that is true.

One thing we may deduce from these remarks is that $Adj(tI_{n}-A)$ is the quotient in the remainderless Euclidean left-division in $M_{n}(k)[t]$ of $p(t)I_{n}$ by $tI_{n}-A$ . That division can actually be performed within the commutative subring $k[A][t]$ of $M_{n}(k)[t]$ (where $k[A]$ denotes the $k$ -linear span of the powers of $A$ inside $M_{n}(k)$ ), in which case Euclidean left-division just becomes usual Euclidean division of polynomials over a commutative ring. This in turn implies that the coefficients of $Adj(tI_{n}-A)$ lie in $k[A]$ ; in particular this is true for its constant term, which up to a sign equals $Adj(A)$ . Thus the adjugate of any matrix $A$ can be expressed as a polynomial in $A$ (with coefficients that depend on $A$ ), something that is not easy to deduce directly from the definition of the adjugate matrix. In fact the Euclidean division can be performed in $k[A][t]$ without knowing the value of $A$ , and doing so shows that the coefficients of the polynomial giving the adjugate are essentially the coeficients $p_{i}$ of the characteristic polynomial, but shifted down one degree: one has

Adj(-A)=\sum _{i=1}^{n}p_{i}A^{i-1}.

Note that from this identity we can obtain the statement of the Cayley–Hamilton theorem by moving $Adj(-A)$ to the right hand side and multiplying the resulting equation (on the left or on the right) by $A$ , since

p_{0}I_{n}=\det(-A)I_{n}=-A\cdot Adj(-A)=Adj(-A)\cdot -A

.

Abstraction and generalizations

The preceding proof made use of the fact that Cramer's rule holds for matrices with entries in the ring of polynomials k[t] with k equal to the real or complex numbers. In fact, Cramer's rule remains true for matrices with entries in any commutative ring B. Using this, one can generalize the Cayley–Hamilton theorem to matrices with entries in any commutative ring R, using Cramer's rule for matrices with entries in B = R[t].

More abstractly, the Cayley–Hamilton theorem holds for finitely generated free modules M over a commutative ring R (for example, for vector spaces V over any field k). If φ:M→M is an endomorphism of R-modules (for example, a linear map from a vector space V to itself), then φ has a characteristic polynomial p defined as follows. Let e_i for i=1,2,...,n be a basis of generators of M. In terms of these generators, $\varphi (e_{j})=\sum a_{ij}e_{i}$ for some square matrix $A=(a_{ij})$ with entries in R. Now define p(t) = det(t I_n − A): it can be shown that p is independent of the choice of basis of M. Similarly, one can define the adjugate endomorphism of an endomorphism of M, and establish a generalized version of Cramer's rule.

The Cayley–Hamilton theorem then states that p(φ) = 0. The proof given above applies without change, since it does not involve division by (nonzero) elements of $k$ . This more general version of the theorem is the source of the celebrated Nakayama lemma in commutative algebra and algebraic geometry.

References

^ Warwick, Kevin: "Using the Cayley Hamilton theorem with N partitioned matrices", IEEE Transactions on Automatic Control, Vol.AC 28, No.12, pp.1127-1128, Dec.1983.

External links

A proof from PlanetMath.

[1] Warwick, Kevin: "Using the Cayley Hamilton theorem with N partitioned matrices", IEEE Transactions on Automatic Control, Vol.AC 28, No.12, pp.1127-1128, Dec.1983.

[1]