Jump to content

Cayley–Hamilton theorem

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Marc van Leeuwen (talk | contribs) at 07:43, 25 April 2008 (uniformized dash in "Cayley–Hamilton"). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Hamilton) states that every square matrix over the real or complex field satisfies its own characteristic equation.

More precisely; if A is the given square n×n matrix and In  is the n×n identity matrix, then the characteristic polynomial of A is defined as:

where "det" is the determinant function. The Cayley–Hamilton theorem states that substituting the matrix in the characteristic polynomial (which involves multiplying its constant term by In, since that is the "zero-th" power of A) results in the zero matrix:

The Cayley–Hamilton theorem also holds for square matrices over commutative rings and for N-partitioned matrices.[1]

An important corollary of the Cayley–Hamilton theorem is that the minimal polynomial of a given matrix is a divisor of its characteristic polynomial. This is very useful in finding the Jordan form of a matrix.

Example

For two dimensional matrices, the theorem gives the identity

In three dimensions, the expression is

and in four

Similar expressions can be obtained for the higher dimensional cases.

Consider for example the matrix

The characteristic polynomial is given by

The Cayley–Hamilton theorem then claims that

which one can quickly verify in this case.

As a result of this, the Cayley–Hamilton theorem allows us to calculate powers of matrices more simply than by direct multiplication.

Taking the result above

Then, for example, to calculate A4, observe

The theorem is also an important tool in calculating eigenvectors.

Note that for a general n x n matrix A, when its inverse exists, i.e. , can be written as a finite sum of powers of A: the form of the characteristic polynomial is

and multiplying both sides of by gives

Proving the Cayley–Hamilton theorem

Erroneous arguments

The biggest difficulty about the Cayley–Hamilton theorem is that there are many wrong ways to prove it; it is crucial to understand what is wrong about them before attempting a correct proof. In most cases errors involve an application of arguments that might be valid in a commutative context to the non-commutative setting of matrix arithmetic. The simplest erroneous argument is even more crude however: it consists of taking the defining equation of the characteristic polynomial, and to simply substitute for , giving . Here the error is of course that on the right hand side the substitution makes no sense: is an matrix with scalars on the diagonal, so substituting for would make a matrix with matrices on the diagonal, and whatever meaning (if any) is given to that, it is certainly not equal to the matrix as the argument suggests (which in fact silently replaces the scalar multiplication in by a matrix multiplication after the substitution). Note by the way that the conclusion of this argument equates to the scalar 0 rather than to the zero matrix as it should.

It may be noted that the very statement of the theorem does something somewhat similar, by taking the matrix and plugging it into the polynomial with scalar coefficients; in this case however this is a precisely defined operation that replaces monomials by and any constant term by , and evaluates the resulting linear combination of matrices in the vector space of matrices (or more abstractly formulated, it takes the image of under the unique ring homomorphism that sends 1 to and to ).

A somewhat more involved but still erroneous argument starts with a statement that is indeed a key ingredient for a correct proof, namely that for any matrix there exists an explicitly determined matrix , its adjugate matrix, for which holds (this is a form of Cramer's rule from linear algebra). Since this is true for matrices with entries in any commutative ring, we can apply it to the matrix , which has entries in the polynomial ring . This gives

Now as in the first argument given, we may be tempted to substitute for , and observe that the first factor, and therefore the left hand side, becomes the null matrix, and the right hand side becomes . Apart from the fact that we have now confused scalar and matrix multiplication on both sides of the equation, so that the result at least has matrices at both sides, the substitution of for in still makes as little sense as in the original argument.

A more sophisticated argument continues from (*), remarking that it can be viewed as an identity between polynomials (of degree ) in with matrices as coefficients, in which setting the substitution of for (still reinterpreting the scalar multiplications in terms of the form as matrix multiplications) would be justified. Considering polynomials over the non-commutative ring may seem strange, but it arises naturally here when writing a matrix with entries polynomial in as a sum of terms, each consisting of the scalar multiplication of a constant matrix by a power of , which is certainly possible. That point of view also dictates how such polynomials should be multiplied, the basic case being for any constant matrices , . However a fundamental difference of such polynomials over a non-commutative ring with the case of polynomials over a commutative ring, is that for an element (a constant matrix in our case) there is no evaluation homomorphism from the ring to the ring that sends to (unless lies in the center of ). This is because the multiplication in implies that commutes with all elements of , and there is no way to define substitution of a non-central element for that respects this (i.e., gives a ring homomorphism). Therefore the argument involving substitution of for is erroneous.

The following example illustrates what is going on in a concrete case. Take

,

whose characteristic polynomial is easily seen to be , while the adjugate matrix of is ; indeed we get as instance of (*) the equation

which is valid for any scalar (since and ). The given argument now claims that it is legitimate to substitute a matrix for into this equation, interpreting and as matrix products. Of course one cannot refute this claim by setting (after all the Cayley–Hamilton theorem is true), but by substituting some other matrix for instead, we would get or after simplification , which is false in general (take for instance for the transpose matrix of ). The upshot is that there is no way to justify any argument in which scalar multiplications are at some point simply replaced by matrix multiplications.

Here is an even more abstract approach, which nevertheless is still erroneous.

Let k[t] be the polynomial ring over k, where k denotes the real or complex numbers, and let k[t]n be the set of length n column vectors whose entries are elements of k[t]. This is a k[t]-module: an element of k[t]n can be multiplied by a polynomial in k[t] to yield an element of k[t]n.

The matrix A provides a homomorphism from k[t] to the ring of n×n matrices by evaluating at t = A. (For example, if is in k[t], then the evaluation of f at t = A is .)

There is also an evaluation map evA at t = A from k[t]n to kn: if for column vectors vj, then evA (v(t)) is defined to be .

(Erroneous) Proof of Cayley–Hamilton theorem. Apply both sides of identity (*) to an arbitrary column vector v in kn to obtain the identity

(t In − A) Adj(t In − A)v = p(t)v,

which is an equation in k[t]n since applying a matrix with polynomial entries to a vector with scalar entries gives a vector with polynomial entries. Now apply the evaluation map evA to both sides. Notice that the left hand side evaluates to

evA((t In − A) Adj(t In − A) v) = (A In − A) evA(Adj(t In − A) v) = 0.

On the other hand, the right hand side evaluates to p(A)v. Thus p(A)v = 0 for every column vector v, and therefore p(A) = 0, QED.

Notice where this argument succumbs to the temptation of replacing the scalar product by a matrix product after substitution. Note also that nothing was proved about the value of or where and are matrices with polynomial coefficients, but some such rule was used in the computation. Confusingly, it is actually possible to define a second map that sends to the null matrix and such that for every one has , namely by (note the order in the right hand side, where products are matrix multiplications). But calling this second map evaluation is misleading (as it is for the first one), since (as explained before) it is not a homomorphism of (non-commutative) rings. And similarly is it not true that when . Therefore the "improved" argument still fails.

A proof

Given that all attempts at a slick proof fail, one must do some real work somewhere. We start again with (*), and write

with , and also

with . The left hand side

can now be expanded by bilinearity of matrix multiplication, terms with a fixed power collected, and its (matrix) coefficient equated to the matrix on the right. The left hand side of equation of the coefficients of then is for , it is for and it is for . Multiplying the equation of the coefficients of from the left by and adding up all equations so formed, the left hand sides cancel out completely, and the right hand sides add up to which proves the theorem.

Some further observations about the proof

Although the given proof completely avoids the dubious practice of subtituting a matrix for in a non-commutative context, its manipulations can be interpreted as doing something rather close to just that, namely splitting the equation into components according to powers of , left-multiplying each component by the same power of , and then adding everything back together again. In fact the reasoning that the left hand sides cancel can be interpreted as a proof that the operation described above, which we shall now call left-evaluation at to stress the fact that multiplication by gets interpreted as left-multiplication by , sends the left hand side of (*) to the null matrix. It does so by first working out the matrix product, thereby not relying on the (false) assumption that is a ring homomorphism, but yet it does not use any particular property of the second factor of the product, indicating that something more general could be said about this.

In the setting of a ring of polynomials in a commuting indeterminate over a non-commutative base ring , there is a well defined notion of Euclidian left-division by a monic polynomial : for every there exist such that and , and the pair is unique. As in the commutative case this is proved by induction on , with the coefficients of being determined by decreasing degree (if the leading coefficient of is the same as that of , and the remaining coefficients are determined by induction). When moreover , so that it is of the form with , the remainder is a constant polynomial; in the commutative case it can be interpreted as the image of under the homomorphism of evaluation at , and in the non-commutative case it is still equal to the (non-homomorphic) left-evaluation . To see this, one may expand and apply the definition of to show that it annihilates that value, basically as is done in the given proof of the theorem; or one may prefer to show first that, although in general one does not have for , this equation does hold whenever commutes with (just) , as is the case for . (There is also an operation of Euclidean right division, and the remainder of right-division by is the similarly defined but generally different right-evaluation at .)

These considerations do not really simplify the proof given, but allow us to make the following general statements that easily imply the theorem: is left-divisble by if and only if , and if in a product the left factor is annihilated by , then so is the product itself (since and therefore are left-divisble by ). While not justifying the final erroneous proof above, this does show that it is not extremely far from something that is true.

One thing we may deduce from these remarks is that is the quotient in the remainderless Euclidean left-division in of by . That division can actually be performed within the commutative subring of (where denotes the -linear span of the powers of inside ), in which case Euclidean left-division just becomes usual Euclidean division of polynomials over a commutative ring. This in turn implies that the coefficients of lie in ; in particular this is true for its constant term, which up to a sign equals . Thus the adjugate of any matrix can be expressed as a polynomial in (with coefficients that depend on ), something that is not easy to deduce directly from the definition of the adjugate matrix. In fact the Euclidean division can be performed in without knowing the value of , and doing so shows that the coefficients of the polynomial giving the adjugate are essentially the coeficients of the characteristic polynomial, but shifted down one degree: one has

Note that from this identity we can obtain the statement of the Cayley–Hamilton theorem by moving to the right hand side and multiplying the resulting equation (on the left or on the right) by , since

.

Abstraction and generalizations

The preceding proof made use of the fact that Cramer's rule holds for matrices with entries in the ring of polynomials k[t] with k equal to the real or complex numbers. In fact, Cramer's rule remains true for matrices with entries in any commutative ring B. Using this, one can generalize the Cayley–Hamilton theorem to matrices with entries in any commutative ring R, using Cramer's rule for matrices with entries in B = R[t].

More abstractly, the Cayley–Hamilton theorem holds for finitely generated free modules M over a commutative ring R (for example, for vector spaces V over any field k). If φ:M→M is an endomorphism of R-modules (for example, a linear map from a vector space V to itself), then φ has a characteristic polynomial p defined as follows. Let ei for i=1,2,...,n be a basis of generators of M. In terms of these generators, for some square matrix with entries in R. Now define p(t) = det(t In − A): it can be shown that p is independent of the choice of basis of M. Similarly, one can define the adjugate endomorphism of an endomorphism of M, and establish a generalized version of Cramer's rule.

The Cayley–Hamilton theorem then states that p(φ) = 0. The proof given above applies without change, since it does not involve division by (nonzero) elements of . This more general version of the theorem is the source of the celebrated Nakayama lemma in commutative algebra and algebraic geometry.

See also

References

  1. ^ Warwick, Kevin: "Using the Cayley Hamilton theorem with N partitioned matrices", IEEE Transactions on Automatic Control, Vol.AC 28, No.12, pp.1127-1128, Dec.1983.