Generalized eigenvector

From Wikipedia, the free encyclopedia
  (Redirected from Generalized eigenspace)
Jump to: navigation, search

In linear algebra, for a matrix A, there may not always exist a full set of linearly independent eigenvectors that form a complete basis – a matrix may not be diagonalizable. This happens when the algebraic multiplicity of at least one eigenvalue λ is greater than its geometric multiplicity (the nullity of the matrix (A-\lambda I), or the dimension of its nullspace). In such cases, a generalized eigenvector of A is a nonzero vector v, which is associated with λ having algebraic multiplicity k ≥ 1, satisfying

(A-\lambda I)^k\mathbf{v} = \mathbf{0}.

The set spanned by all generalized eigenvectors for a given λ, form the generalized eigenspace for λ.

Ordinary eigenvectors and eigenspaces are obtained for k=1.

For defective matrices[edit]

Generalized eigenvectors are needed to form a complete basis of a defective matrix, which is a matrix in which there are fewer linearly independent eigenvectors than eigenvalues (counting multiplicity). Over an algebraically closed field, the generalized eigenvectors do allow choosing a complete basis, as follows from the Jordan form of a matrix.

In particular, suppose that an eigenvalue λ of a matrix A has an algebraic multiplicity m but fewer corresponding eigenvectors. We form a sequence of m eigenvectors and generalized eigenvectors x_1, x_2, \ldots, x_m that are linearly independent and satisfy

(A - \lambda I) x_k = \alpha_{k,1}x_1+\cdots+\alpha_{k,k-1}x_{k-1}

for some coefficients \alpha_{k,1},\ldots,\alpha_{k,k-1}, for k=1,\ldots,m. It follows that

(A - \lambda I)^k x_k = 0. \!

The vectors x_1, x_2, \ldots, x_m can always be chosen, but are not uniquely determined by the above relations. If the geometric multiplicity (dimension of the eigenspace) of λ is p, one can choose the first p vectors to be eigenvectors, but the remaining mp vectors are only generalized eigenvectors.

Examples[edit]

Example 1[edit]

Suppose

 A = \begin{bmatrix} 1 & 1\\ 0 & 1 \end{bmatrix}.

Then there is one eigenvalue λ=1 with an algebraic multiplicity m of 2.

There are several ways to see that there will be one generalized eigenvector necessary. Easiest is to notice that this matrix is in Jordan normal form, but is not diagonal, meaning that this is not a diagonalizable matrix. Since there is one superdiagonal entry, there will be one generalized eigenvector (or you could note that the vector space is of dimension 2, so there can be only one generalized eigenvector). Alternatively, you could compute the dimension of the nullspace of  A-I to be p=1, and thus there are m-p=1 generalized eigenvectors.

Computing the ordinary eigenvector  v_1=\begin{bmatrix}1 \\0 \end{bmatrix} is left to the reader (see the eigenvector page for examples). Using this eigenvector, we compute the generalized eigenvector  v_2 by solving

 (A-\lambda I)v_2 = v_1.

Writing out the values:

 \left(\begin{bmatrix} 1 & 1\\ 0 & 1 \end{bmatrix}- \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix}\right)\begin{bmatrix}v_{21} \\v_{22} \end{bmatrix} = \begin{bmatrix}1 \\0 \end{bmatrix}.

This simplifies to

 \begin{matrix} v_{21}+v_{22}-v_{21} = 1 \\ v_{22}- v_{22} = 0. \end{matrix}

This simplifies to

 v_{22}= 1.

And v_{21} has no restrictions and thus can be any scalar. So the generalized eigenvector is  v_2=\begin{bmatrix}* \\1 \end{bmatrix}, where the * indicates that any value is fine. Usually picking 0 is easiest.

Example 2[edit]

The matrix

A = \begin{bmatrix} 
1 & 0 & 0 & 0 & 0 \\
3 & 1 & 0 & 0 & 0 \\
6 & 3 & 2 & 0 & 0 \\
10 & 6 & 3 & 2 & 0 \\
15 & 10 & 6 & 3 & 2
\end{bmatrix}

has eigenvalues of 1 and 2 with algebraic multiplicities of 2 and 3, but geometric multiplicities of 1 and 1.

The generalized eigenspaces of A are calculated below.

(A-1 I) \begin{bmatrix}
0 \\ 1 \\ -3 \\ 3 \\ -1
\end{bmatrix} = \begin{bmatrix} 
0 & 0 & 0 & 0 & 0 \\
3 & 0 & 0 & 0 & 0 \\
6 & 3 & 1 & 0 & 0 \\
10 & 6 & 3 & 1 & 0 \\
15 & 10 & 6 & 3 & 1
\end{bmatrix}\begin{bmatrix}
0 \\ 1 \\ -3 \\ 3 \\ -1
\end{bmatrix} = \begin{bmatrix}
0 \\ 0 \\ 0 \\ 0 \\ 0
\end{bmatrix}
(A - 1 I) \begin{bmatrix}
1 \\ -15 \\ 30 \\ -1 \\ -45
\end{bmatrix} = \begin{bmatrix} 
0 & 0 & 0 & 0 & 0 \\
3 & 0 & 0 & 0 & 0 \\
6 & 3 & 1 & 0 & 0 \\
10 & 6 & 3 & 1 & 0 \\
15 & 10 & 6 & 3 & 1
\end{bmatrix} \begin{bmatrix}
1 \\ -15 \\ 30 \\ -1 \\ -45
\end{bmatrix} = 3\begin{bmatrix}
0 \\ 1 \\ -3 \\ 3 \\ -1
\end{bmatrix}
(A - 2 I) \begin{bmatrix}
0 \\ 0 \\ 0 \\ 0 \\ 1
\end{bmatrix} = \begin{bmatrix} 
-1 & 0 & 0 & 0 & 0 \\
3 & -1 & 0 & 0 & 0 \\
6 & 3 & 0 & 0 & 0 \\
10 & 6 & 3 & 0 & 0 \\
15 & 10 & 6 & 3 & 0
\end{bmatrix} \begin{bmatrix}
0 \\ 0 \\ 0 \\ 0 \\ 1
\end{bmatrix} = \begin{bmatrix}
0 \\ 0 \\ 0 \\ 0 \\ 0
\end{bmatrix}
(A - 2 I) \begin{bmatrix}
0 \\ 0 \\ 0 \\ 1 \\ 0
\end{bmatrix} = \begin{bmatrix} 
-1 & 0 & 0 & 0 & 0 \\
3 & -1 & 0 & 0 & 0 \\
6 & 3 & 0 & 0 & 0 \\
10 & 6 & 3 & 0 & 0 \\
15 & 10 & 6 & 3 & 0
\end{bmatrix} \begin{bmatrix}
0 \\ 0 \\ 0 \\ 1 \\ 0
\end{bmatrix} = 3 \begin{bmatrix}
0 \\ 0 \\ 0 \\ 0 \\ 1
\end{bmatrix}
(A - 2 I) \begin{bmatrix}
0 \\ 0 \\ 1 \\ -2 \\ 0
\end{bmatrix} = \begin{bmatrix} 
-1 & 0 & 0 & 0 & 0 \\
3 & -1 & 0 & 0 & 0 \\
6 & 3 & 0 & 0 & 0 \\
10 & 6 & 3 & 0 & 0 \\
15 & 10 & 6 & 3 & 0
\end{bmatrix} \begin{bmatrix}
0 \\ 0 \\ 1 \\ -2 \\ 0
\end{bmatrix} = 3 \begin{bmatrix}
0 \\ 0 \\ 0 \\ 1 \\ 0
\end{bmatrix}

This results in a basis for each of the generalized eigenspaces of A. Together they span the space of all 5-dimensional column vectors.


\left\{
\begin{bmatrix} 0 \\ 1 \\ -3 \\ 3 \\ -1 \end{bmatrix}
\begin{bmatrix} 1 \\ -15 \\ 30 \\ -1 \\ -45 \end{bmatrix} 
\right\},
\left\{ 
\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 1 \end{bmatrix}
\begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{bmatrix}
\begin{bmatrix} 0 \\ 0 \\ 1 \\ -2 \\ 0 \end{bmatrix}
\right\}

The Jordan Canonical Form is obtained.


T = \begin{bmatrix}
0 &  0 & 0 &1& 0 \\
3 &  0 & 0 &-15& 0 \\
-9 &  0 & 0 &30& 1 \\
9 &  0 & 3 &-1& -2 \\
-3 &  9 & 0 &-45& 0
\end{bmatrix} \quad J = \begin{bmatrix}
1 & 1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 \\
0 & 0 & 2 & 1 & 0 \\
0 & 0 & 0 & 2 & 1 \\
0 & 0 & 0 & 0 & 2
\end{bmatrix}

where

AT = TJ

Other meanings of the term[edit]

 Av = \lambda B v.

The Nullity of (A − λ I)k[edit]

Introduction[edit]

In this section it is shown, when \lambda is an eigenvalue of a matrix A with algebraic multiplicity k, then the null space of (A - \lambda I)^k has dimension k.

Existence of Eigenvalues[edit]

Consider a n × n matrix A. The determinant of A has the fundamental properties of being n linear and alternating. Additionally det(I) = 1, for I the n × n identity matrix. From the determinant's definition it can be seen that for a triangular matrix T = (tij) that det(T) = ∏(tii). In other words, the determinant is the product of the diagonal entries.

There are three elementary row operations, scalar multiplication, interchange of two rows, and the addition of a scalar multiple of one row to another. Multiplication of a row of A by α results in a new matrix whose determinant is α det(A). Interchange of two rows changes the sign of the determinant, and the addition of a scalar multiple of one row to another does not affect the determinant. The following simple theorem holds, but requires a little proof.

Theorem: The equation A x = 0 has a solution x0, if and only if det(A) = 0.

Proof: Given the equation A x = 0 attempt to solve it using the elementary row operations of addition of a scalar multiple of one row to another and row interchanges only, until an equivalent equation U x = 0 has been reached, with U an upper triangular matrix. Since det(U) = ±det(A) and det(U) = ∏(uii) we have that det(A) = 0 if and only if at least one uii = 0. The back substitution procedure as performed after Gaussian Elimination will allow placing at least one non zero element in x when there is a uii = 0. When all uii ≠ 0 back substitution will require x = 0. QED

Theorem: The equation A x = λ x has a solution x0, if and only if det( λ IA) = 0.

Proof: The equation A x = λ x is equivalent to ( λ IA) x = 0. QED.

Constructive proof of Schur's triangular form[edit]

The proof of the main result of this section will rely on the similarity transformation as stated and proven next.

Theorem: (Schur Transformation to Triangular Form Theorem) For any n × n matrix A, there exists a triangular matrix T and a unitary matrix Q, such that A Q = Q T. (The transformations are not unique, but are related.)

Proof: Let λ1 be an eigenvalue of the n × n matrix A and x be an associated eigenvector, so that A x = λ1x. Normalize the length of x so that |x| = 1.

For

x=\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix},

construct a unitary matrix

Q=\begin{bmatrix}
x_1 & q_{1\,2} & q_{1\,3} & \cdots & q_{1\,n} \\
x_2 & q_{2\,2} & q_{2\,3} & \cdots & q_{2\,n} \\
\vdots & \vdots & \vdots & & \vdots \\
x_n & q_{n\,2} & q_{n\,3} & \cdots & q_{n\,n}
\end{bmatrix}

Q should have x as its first column and have its columns an orthonormal basis for Cn. Now, A Q = Q U1, with U1 of the form:

Let the induction hypothesis be that the theorem holds for all (n-1) × (n-1) matrices. From the construction, so far, it holds for n = 2. Choose a unitary Q0, so that U0 Q0 = Q0 U2, with U2 of the upper triangular form. Define Q1 by:

Now:

Summarizing,

U_1 Q_1 = Q_1 U_3

with:

U_3=\begin{bmatrix}
\lambda_1 & z_{1\,2} & z_{1\,3} & \cdots & z_{1\,n} \\
0 & \lambda_2 & z_{2\,3} & \cdots & z_{2\,n} \\
0 & 0 & \lambda_3 & \cdots & z_{3\,n} \\
\vdots & \vdots & \vdots & & \vdots \\
0 & 0 & 0 & \cdots & \lambda_n
\end{bmatrix}

Now, A Q = Q U1 and U1 Q1 = Q1 U3, where Q and Q1 are unitary and U3 is upper triangular. Thus A Q Q1 = Q Q1 U3. Since the product of two unitary matrices is unitary, the proof is done. QED.

Nullity Theorem's Proof[edit]

Starting from A Q = Q U, we can solve for A to obtain A = Q U QT, since Q QT = I. Now, after subtracting xI from both sides, we find

x IA = Q (x IU) QT

and hence

det(x IA) = det(x IU).

So, the characteristic polynomial of A is the same as that for U and is given by

p(x) = (x − λ1)(x − λ2)...(x − λn),

where the λis are the eigenvalues of A and U.

Observe, the construction used in the proof above, allows choosing any order for the eigenvalues of A that will end up as the diagonal elements of the upper triangular matrix U obtained. The algebraic multiplicity of an eigenvalue is the count of the number of times it occurs on the diagonal.

Now. it can be supposed for a given eigenvalue λ, of algebraic multiplicity k, that U has been contrived so that λ occurs as the first k diagonal elements.

Place U − λI in block form as below.

The lower left block has only elements of zero. The βi = λi − λ ≠ 0 for i = k+1, ..., n. It is easy to verify the following.

Where B is the k × k subtriangular matrix, with all elements on or below the diagonal equal to 0, and T is the (n-k) × (n-k) upper triangular matrix, taken from the blocks of (U − λI), as shown below.

Now, almost trivially,

That is Bk has only elements of 0 and Tk is triangular with all non zero diagonal elements. Observe that if a column vector v = [v1, v2, ..., vk]T, is multiplied by B, then after the first multiplication the last, kth, component is zero. After the second multiplication the second to last, (k-1)th component is zero, also, and so on.

The conclusion that (U − λ I)k has rank (n-k) and nullity k follows.

It is only left to observe, since (A − λI)k = Q (U − λ I)k QT, that (A − λI)k has rank (n-k) and nullity k, as well. A unitary, or any other similarity transformation by a non-singular matrix preserves rank.

The main result is now proven.

Theorem:
If λ is an eigenvalue of a matrix A with algebraic multiplicity k, then the null space of (A − λI)k has dimension k.

An important observation is that raising the power of (A − λI) above k will not affect the rank and nullity any further.

Motivation of the Procedure[edit]

Introduction[edit]

In the section Existence of Eigenvalues it was shown that when a n × n matrix A, has an eigenvalue λ, of algebraic multiplicity k, then the null space of (A − λI)k, has dimension k.

The Generalized Eigenspace of A, λ will be defined to be the null space of (A − λI)k. Many authors prefer to call this the kernel of (A − λI)k.

Notice that if a n × n matrix has eigenvalues λ1, λ2, ..., λr with algebraic multiplicities k1, k2, ..., kr, then k1 + k2 + ... + kr = n.

It will turn out that any two generalized eigenspaces of A, associated with different eigenvalues, will have a trivial intersection of {0}. From this it follows that the generalized eigenspaces of A combined span Cn, the set of all n dimensional column vectors of complex numbers.

The motivation for using a recursive procedure starting with the eigenvectors of A and solving for a basis of the generalized eigenspace of A, λ using the matrix (A − λ I), will be expounded on.

Notation[edit]

Some notation is introduced to help abbreviate statements.

  • Cn is the vector space of all n dimensional column vectors of complex numbers.
  • The Null Space of A, N(A) = {x: A x = 0}.
  • V ⊆ W denotes V is a subset of W.
  • V ⊂ W denotes V is a proper subset of W.
  • The Range of A over V, is A(V) = {y: y = A x, for some xV}.
  • W \ V denotes the set {x: xW and x is not in V}.
  • The Range of A is A(Cn) and will be denoted by R(A).
  • dim(V) denotes the dimension of V.
  • {0} is the trivial subspace of Cn.

Preliminary Observations[edit]

Throughout this discussion it is assumed that A is a n × n matrix of complex numbers.

Since Am x = A (Am-1 x), the inclusions

N(A) ⊆ N(A2) ⊆ ... ⊆ N(Am-1) ⊆ N(Am),

are obvious. Since Am x = Am-1(A x), the inclusions

R(A) ⊇ R(A2) ⊇ ... ⊇ R(Am-1) ⊇ R(Am),

are clear as well.

Theorem:

When the more trivial case N(A2) = N(A), does not hold, there exists k ≥ 2, such that the inclusions,

N(A) ⊂ N(A2) ⊂ ... ⊂ N(Ak-1) ⊂ N(Ak) = N(Ak+1) = ...,

and

R(A) ⊃ R(A2) ⊃ ... ⊃ R(Ak-1) ⊃ R(Ak = R(Ak+1) = ...,

are proper.

Proof: 0 ≤ dim(R(Am+1)) ≤ dim(R(Am)) so eventually dim(R(Am+1)) = dim(R(Am)), for some m. From the inclusion R(Am+1) ⊆ R(Am) it is seen that a basis for R(Am+1) is a basis for R(Am) as well. That is, R(Am+1) = R(Am). Since R(Am+1) = A(R(Am)), when R(Am+1) = R(Am), it will be R(Am+2) = A(R(Am+1)) = A(R(Am)) = R(Am+1). By the rank nullity theorem, it will also be the case that dim(N(Am+2)) = dim(N(Am+1)) = dim(N(Am)), for the same m. From the inclusions N(Am+2) ⊆ N(Am+1) ⊆ N(Am), it is clear that a basis for N(Am+2) is also a basis for N(Am+1) and N(Am). So N(Am+2) = N(Am+1) = N(Am). Now, k is the first m for which this happens. QED

Since certain expressions will occur many times in the following, some more notation will be introduced.

  • Aλ,k = (A − λI)k
  • Nλ,k = N((A − λI)k) = N(Aλ,k)
  • Rλ,k = R((A − λI)k) = R(Aλ,k)

From the inclusions Nλ,1 Nλ,2 ⊂ ... ⊂ Nλ,k-1Nλ,k = Nλ,k+1 = ..., Nλ,k \ {0} = ∪ (Nλ,m \ Nλ,m-1), for m = 1, ..., k and Nλ,0 = {0}, follows.

When λ is an eigenvalue of A, in the statement above, k will not exceed the algebraic multiplicity of λ, and can be less. In fact when k would only be 1 is when there is a full set of linearly independent eigenvectors. Let's consider when k ≥ 2.

Now, xNλ,m \ Nλ,m-1, if and only if Aλ,m x = 0, and Aλ,m-1 x0.

Make the observation that Aλ,m x = 0, and Aλ,m-1 x0, if and only if Aλ,m-1 Aλ,1 x = 0, and Aλ,m-2 Aλ,1 x0.

So, xNλ,m \ Nλ,m-1, if and only if Aλ,1 xNλ,m-1 \ Nλ,m-2.

Recursive Procedure[edit]

Consider a matrix A, with an eigenvalue λ of algebraic multiplicity k ≥ 2, such that there are not k linearly independent eigenvectors associated with λ.

It is desired to extend the eigenvectors to a basis for Nλ,k. That is a basis for the generalized eigenvectors associated with λ.

There exists some 2 ≤ r ≤ k, such that

Nλ,1Nλ,2 ⊂ ... Nλ,r-1Nλ,r = Nλ,r+1 = ..., Nλ,r \ {0} = ∪ (Nλ,m \ Nλ,m-1), for m = 1, ..., r and Nλ, 0 = {0}.

The eigenvectors are Nλ,1 \ {0}, so let x1, ..., xr1 be a basis for Nλ,1 \ {0}.

Note that each Nλ,m is a subspace and so a basis for Nλ,m-1 can be extended to a basis for Nλ,m.

Because of this we can expect to find some r2 = dim(Nλ,2) − dim(Nλ,1) linearly independent vectors xr1+1, ..., xr1+r2 such that x1, ..., xr1 , xr1+1, ..., xr1+r2 is a basis for Nλ,2

Now, xNλ,2 \ Nλ,1, if and only if Aλ,1 xNλ,1 \ {0}.

Thus we can expect that for each x{xr1+1, ..., xr1+r2} , Aλ,1 x = α1 x1 + ... + αr1 xr1, for some α1, ..., αr1, depending on x.

Suppose we have reached the stage in the construction so that m-1 sets,

{x1, ..., xr1}, {xr1+1, ..., xr1+r2}, ..., {xr1+ ... + rm-2+1, ..., xr1+ ... + rm-1}

such that

x1, ..., xr1, xr1+1, ..., xr1+r2, ..., xr1+ ... + rm-2 + 1, ..., xr1+ ... + rm-1

is a basis for Nλ,m-1, have been found.

We can expect to find some

rm = dim(Nλ,m) − dim(Nλ,m-1)

linearly independent vectors

xr1+ ... + rm-1+1, ..., xr1+ ... + rm

such that

x1, ..., xr1, xr1+1, ..., xr1+ r2, ..., xr1+ ... + rm-1 + 1, ..., xr1+ ... + rm

is a basis for Nλ, m

Again, xNλ,m \ Nλ,m-1, if and only if Aλ,1 xNλ,m-1 \ Nλ,m-2.

Thus we can expect that for each x ∈ {xr1+ ... + rm-1 + 1, ..., xr1+ .... + rm}, Aλ,1 x = α1 x1 + ... + αr1+ ... + rm-1 xr1+ ... + rm-1, for some α1, ..., αr1+ ... + rm-1, depending on x.

Some of the r1+ ... + rm-2 + 1, ..., αr1+ .... + rm-1}, will be non zero, since Aλ,1 x must lie in Nλ,m-1 \ Nλ,m-2.

The procedure is continued until m = r.

The αi are not truly arbitrary and must be chosen, accordingly, so that sums α1 x1 + α2 x2 + ... are in the range of Aλ,1.

Generalized Eigenspace Decomposition[edit]

As was stated in the Introduction, if a n × n matrix has eigenvalues λ1, λ2, ..., λr with algebraic multiplicities k1, k2, ..., kr, then k1 + k2 + ... + kr = n.

When V1 and V2 are two subspaces, satisfying V1V2 = {0}, their direct sum, is defined and notated by

  • V1 V2 = {v1 + v2 : v1V1 and v2V2} .

V1 V2 is also a subspace and dim(V1 V2) = dim(V1) + dim(V2).

Since dim(Nλi,ki) = ki, for i = 1, 2, ..., r, after it is shown that Nλi,ki Nλj,kj = {0}, for i ≠ j, we have the main result.

Theorem: Generalized Eigenspace Decomposition Theorem

Cn = Nλ1,k1 Nλ2,k2 ... Nλr,kr.

This follows easily after we prove the theorem below.

Theorem:
Let λ be an eigenvalue of A and β ≠ λ. Then Aβ,r(Nλ,m \ Nλ,m-1) = Nλ,m \ Nλ,m-1, for any positive integers m and r.

Proof:
If xNλ,1 \ {0}, Aλ,1 x = (A − λ I)x = 0, then A x = λ x and Aβ,1 x = (A − βI)x = (λ − β)x.

So Aβ,1 x Nλ,1 \ {0} and Aβ,1 (λ − β)−1x = x.

It holds Aβ,1 (Nλ,1 \ {0}) = Nλ,1 \ {0}.

Now, xNλ,m \ Nλ,m-1, if and only if Aλ,m x = (A − λI)Aλ,m-1 x = 0, and Aλ,m-1 x0.

In the case, xNλ,m \ Nλ,m-1, Aλ,m-1 xNλ,1 \ 0, and Aβ,1 Aλ,m-1 x = (λ − β) Aλ,m-1 x0. The operators Aβ,1 and Aλ,m-1 commute. Thus Aλ,m (Aβ,1 x) = 0 and Aλ,m-1 (Aβ,1 x) ≠ 0, which means Aβ,1 x Nλ,m \ Nλ,m-1.

Now, let our induction hypothesis be, Aβ,1(Nλ,m \ Nλ,m-1) = Nλ,m \ Nλ,m-1.

The relation Aβ,1 x = (λ − β) x + Aλ,1 x holds.

For yNλ,m+1 \ Nλ, m, let x = (λ − β)-1 y + z.

Then Aβ,1 x = y + (λ − β)-1Aλ,1 y + (λ − β) z + Aλ,1 z = y + (λ − β)-1Aλ,1 y + Aβ,1 z.

Now, Aλ,1 yNλ,m \ Nλ,m-1 and, by the induction hypothesis, there exists zNλ,m \ Nλ,m-1 that solves Aβ,1 z = −(λ − β)-1Aλ,1 y.

It follows xNλ,m+1 \ Nλ,m and solves Aβ,1 x = y.

So Aβ,1(Nλ,m+1 \ Nλ,m) = Nλ,m+1 \ Nλ,m.

Repeatedly applying Aβ,r = Aβ,1 Aβ,r-1 finishes the proof.

In fact, from the theorem just proved, for i ≠ j, Aλi,ki(Nλj,kj)= Nλj,kj.

Now, suppose that Nλi,kiNλj,kj{0}, for some i ≠ j.

Choose xNλi,kiNλj,kj0.

Since xNλi,ki, it follows Aλi,ki x = 0.

Since xNλj,kj, it follows Aλi,ki x0, because Aλi,ki preserves dimension on Nλj,kj.

So it must be Nλi,kiNλj,kj = {0}, for i ≠ j.

This concludes the proof of the Generalized Eigenspace Decomposition Theorem.

Powers of a Matrix[edit]

Using generalized eigenvectors[edit]

Assume A is a n × n matrix with eigenvalues λ1, λ2, ..., λr of algebraic multiplicities k1, k2, ..., kr.

For notational convenience  Aλ, 0  =  I.

Note that  Aβ, 1  =   (λ − β)IAλ, 1 . and apply the binomial theorem.

A_{\beta,s}=((\lambda-\beta)I+A_{\lambda,1})^s=\sum_{m=0}^s\binom{s}{m}(\lambda-\beta)^{s-m}A_{\lambda,m}

When λ is an eigenvalue of algebraic multiplicity  k, and x ∈ Nλ, k,
 then Aλ, m x = 0, for m ≥ k, so in this case:

A_{\beta,s}x=\sum_{m=0}^{\min(s,k-1)}\binom{s}{m}(\lambda-\beta)^{s-m}A_{\lambda,m}x

Since   Cn  =  Nλ1, k1   Nλ2, k2   ...   Nλr, kr,
any x in Cn can be expressed as  x = x1 + x2 + ... + xr ,
with each xi ∈ Nλi, ki.   Hence:

A_{\beta,s}x=\sum_{i=1}^r\sum_{m=0}^{\min(s,k_i-1)}\binom{s}{m}(\lambda_i-\beta)^{s-m}A_{\lambda_i,m}x_i

The columns of Aβ, s are obtained by letting  x  vary across the standard basis vectors.

The case A0, s is the power  As  of  A.

The minimal polynomial of a matrix[edit]

Assume A is a n × n matrix with eigenvalues λ1, λ2, ..., λr
of algebraic multiplicities k1, k2, ..., kr.

For each i define  α(λi), the null index of λi, to be the
smallest positive integer α such that  Nλi, α  =  Nλi, ki.

It is often the case that α(λi) < ki.

Then  p(x) = ∏ (x − λi)α(λi)  is the minimal polynomial for A.

To see this note  p(A) = ∏ A λi,α(λi)  and the factors can be commuted in any order.

So  p(A) (Nλj, kj ) = {0},  because  λj,α(λj)  (Nλj, kj ) = {0}.  Being that

Cn  =  Nλ1, k1   Nλ2, k2   ...   Nλr, kr,  it is clear p(A) = 0.

Now p(x) can not be of less degree because  β, 1 (Nλj, kj ) =  Nλj, kj ,

when β ≠ λj, and so λj,α(λj)  must be a factor of p(A), for each  j.

Using confluent Vandermonde matrices[edit]

An alternative strategy is to use the characteristic polynomial of matrix A.

Let   p(x) = a0 + a1 x + a2 x2 + ... + an-1 xn-1 + xn

be the characteristic polynomial of A.

The minimal polynomial of A can be substituted for p(x) in this discussion, if it is known,
and different, to reduce the degree n and the multiplicities of the eigenvalues.

Then p(A) = 0  and  An = −(a0 I + a1 A + a2 A2 + ... + an-1 An-1).

So An+m = bm, 0 I + bm, 1 A + bm, 2 A2 + ... + bm, n-1 An-1,

where the   bm, 0, bm, 1, bm, 2, ..., bm, n-1  satisfy the recurrence relation


bm, 0 = −a0 bm-1, n-1
bm, 1 = bm-1, 0 − a1 bm-1, n-1
bm, 2 = bm-1, 1 − a2 bm-1, n-1
...,
bm, n-1 = bm-1, n-2 − an-1 bm-1, n-1

with   b0, 0 = b0, 1 = b0, 2 = ... = b0, n-2 = 0,  and  b0, n-1 = 1.

This alone will reduce the number of multiplications needed to calculate a higher
power of A by a factor of n2, as compared to simply multiplying An+m by A.

In fact the   bm, 0, bm, 1, bm, 2, ..., bm, n-1  can be calculated by a formula.

Consider first when A has distinct eigenvalues   λ1, λ2, ..., λn.
Since p(λi) = 0, for each i, the λi satisfy the recurrence relation also. So:

\begin{bmatrix}
1 & \lambda_1 & \lambda_1^2 & \cdots & \lambda_1^{n-1} \\
1 & \lambda_2 & \lambda_2^2 & \cdots & \lambda_2^{n-1} \\
\vdots & \vdots & \vdots & & \vdots \\
1 & \lambda_n & \lambda_n^2 & \cdots & \lambda_n^{n-1}
\end{bmatrix}\begin{bmatrix}
b_{m,0} \\ b_{m,1} \\ \vdots \\ b_{m,n-1}
\end{bmatrix}=\begin{bmatrix}
\lambda_1^{n+m} \\ \lambda_2^{n+m} \\ \vdots \\ \lambda_n^{n+m}
\end{bmatrix}

The matrix V in the equation is the well studied Vandermonde's,
for which formulas for its determinant and inverse are known.

\det(V(\lambda_1,\lambda_2,\ldots,\lambda_n))=\prod_{1\le i<j\le n}(\lambda_j-\lambda_i)

In the case that λ2 = λ1 , consider instead when  λ1  is near λ2 , and
subtract row 1 from row  2, which does not affect the determinant.

\begin{bmatrix}
1 & \lambda_1 & \lambda_1^2 & \cdots & \lambda_1^{n-1} \\
0 & \lambda_2-\lambda_1 & \lambda_2^2-\lambda_1^2 & \cdots & \lambda_2^{n-1}-\lambda_1^{n-1} \\
\vdots & \vdots & \vdots & & \vdots \\
1 & \lambda_n & \lambda_n^2 & \cdots & \lambda_n^{n-1}
\end{bmatrix}=
\begin{bmatrix}
\lambda_1^{n+m} \\
\lambda_2^{n+m}-\lambda_1^{n+m} \\
\vdots \\
\lambda_n^{n+m}
\end{bmatrix}

After dividing the second row by  2 − λ1)  the determinant will be affected by
the removal of this factor and still be non-zero.

\begin{vmatrix}
0 &
\frac{\lambda_2-\lambda_1}{(\lambda_2-\lambda_1)} &
\frac{\lambda_2^2-\lambda_1^2}{(\lambda_2-\lambda_1)} &
\cdots &
\frac{\lambda_2^{n-1}-\lambda_1^{n-1} }{(\lambda_2-\lambda_1)}
\end{vmatrix}
\quad \begin{vmatrix}\frac{\lambda_2^{n+m}-\lambda_1^{n+m} }{(\lambda_2-\lambda_1)}\end{vmatrix}

Taking the limit as λ1→ λ2, the new system has the second row differentiated.

\begin{bmatrix}
1 & \lambda_2 & \lambda_2^2 & \cdots & \lambda_2^{n-1} \\
0 & 1 & 2\lambda_2 & \cdots & (n-1)\lambda_2^{n-2} \\
1 & \lambda_3 & \lambda_3^2 & \cdots & \lambda_3^{n-1} \\
\vdots & \vdots & \vdots & & \vdots \\
1 & \lambda_n & \lambda_n^2 & \cdots & \lambda_n^{n-1}
\end{bmatrix}=
\begin{bmatrix}
\lambda_2^{n+m} \\
(n+m)\lambda_2^{n+m-1} \\
\lambda_3^{n+m} \\
\vdots \\
\lambda_n^{n+m}
\end{bmatrix}

The new system has determinant:

\det(V(\lambda_2,\ldots,\lambda_n))=\prod_{3\le j\le n}(\lambda_j-\lambda_2)^2\prod_{3\le i<j\le n}(\lambda_j-\lambda_i)

In the case that λ3 = λ2 , also, consider like before when  λ2  is near λ3, and
subtract row 1 from row 3, which does not affect the determinant. Next divide
row three by  3 − λ2)  and then subtract row 2 from the new row 3 and
follow by dividing the resulting row 3 by  3 − λ2)  again. This will affect the
determinant by removing a factor of  3 − λ2)2

Each element of row 3 is now of the form

((f(\lambda_3)-f(\lambda_2))/(\lambda_3-\lambda_2)-f'(\lambda_2))/(\lambda_3-\lambda_2)

and

((f(\lambda_3)-f(\lambda_2))/(\lambda_3-\lambda_2)-f'(\lambda_2))/(\lambda_3-\lambda_2)\rightarrow\tfrac{1}{2}f''(\lambda_3)\text{  as }\lambda_2\rightarrow\lambda_3

The effect is to differentiate twice and multiply by one half.

\begin{bmatrix}
1 & \lambda_3 & \lambda_3^2 & \lambda_3^3 & \cdots & \lambda_3^{n-1} \\
0 & 1 & 2\lambda_3 & 3\lambda_3^2 & \cdots & (n-1)\lambda_3^{n-2} \\
0 & 0 & 1 & 3\lambda_3 & \cdots & \tfrac{1}{2}(n-1)(n-2)\lambda_3^{n-3} \\
1 & \lambda_4 & \lambda_4^2 & \lambda_4^3 & \dots & \lambda_4^{n-1} \\
\vdots & \vdots & \vdots & \vdots & & \vdots \\
1 & \lambda_n & \lambda_n^2 & \lambda_n^3 & \cdots & \lambda_n^{n-1}
\end{bmatrix}
\begin{bmatrix}
\lambda_3^{n+m} \\
(n+m)\lambda_3^{n+m-1} \\
\tfrac{1}{2}(n+m)(n+m-1)\lambda_3^{n+m-2} \\
\lambda_4^{n+m} \\
\vdots \\
\lambda_n^{n+m}
\end{bmatrix}

The new system has determinant:

\det(V(\lambda_3,\ldots,\lambda_n))=\prod_{4\le j\le n}(\lambda_j-\lambda_3)^3\prod_{4\le i<j\le n}(\lambda_j-\lambda_i)

If it were that the multiplicity of the eigenvalue was even higher, then the next row would
be differentiated three times and multiplied by 1/3!. The progression is 1/s! f(s), with the
constant coming from the coefficients of the derivatives in the Taylor expansion. This
being done for each eigenvalue of algebraic multiplicity greater than 1.

example

The matrix A=\begin{bmatrix}
1 & 0 & 0 & 0 & 0 \\
3 & 1 & 0 & 0 & 0 \\
6 & 3 & 2 & 0 & 0 \\
10 & 6 & 3 & 2 & 0 \\
15 & 10 & 6 & 3 & 2
\end{bmatrix}

has characteristic polynomial  p(x) = (x − 1)2(x − 2)3.

The   bm, 0, bm, 1, bm, 2,  bm, 3, bm, 4   for which

A5+m  =  bm, 0 I + bm, 1 A + bm, 2 A2 + bm, 3 A3 + bm, 4 A4,

satisfy the confluent Vandermonde system next.

\begin{bmatrix}
1 & 1 & 1^2 & 1^3 & 1^4 \\
0 & 1 & 2\cdot 1 & 3 \cdot 1^2 & 4\cdot 1^3 \\
1 & 2 & 2^2 & 2^3 & 2^4 \\
0 & 1 & 2\cdot 2 & 3\cdot 2^2 & 4\cdot 2^3 \\
0 & 0 & 1 & 3\cdot 2 & 6\cdot 2^2
\end{bmatrix}
\begin{bmatrix} b_{m,0} \\ b_{m,1} \\ b_{m,2} \\ b_{m,3} \\ b_{m,4} \end{bmatrix} =
\begin{bmatrix}
1^{5+m} \\
(5+m)\cdot 1^{5+m-1} \\
2^{5+m} \\
(5+m)\cdot 2^{5+m-1} \\
\tfrac{1}{2}(5+m)(5+m-1)\cdot 2^{5+m-2}
\end{bmatrix}
\begin{bmatrix} b_{m,0} \\ b_{m,1} \\ b_{m,2} \\ b_{m,3} \\ b_{m,4} \end{bmatrix} =
\begin{bmatrix}
-16 & -8 & 17 & -10 & 4 \\
48 & 20 & -48 & 29 & -12 \\
-48 & -18 & 48 & -30 & 13 \\
20 & 7 & -20 & 13 & -6 \\
-3 & -1 & 3 & -2 & 1
\end{bmatrix}
\begin{bmatrix}
1 \\
(5+m) \\
32 \cdot 2^m \\
16(5+m) \cdot 2^m \\
4(5+m)(5+m-1) \cdot 2^m
\end{bmatrix}

Using difference equations[edit]

Returning to the recurrence relation for  bm, 0, bm, 1, bm, 2, ..., bm, n-1,
bm, 0 = −a0 bm-1, n-1
bm, 1 = bm-1, 0 − a1 bm-1, n-1
bm, 2 = bm-1, 1 − a2 bm-1, n-1
...,
bm, n-1 = bm-1, n-2 − an-1 bm-1, n-1
with   b0, 0 = b0, 1 = b0, 2 = ... = b0, n-2 = 0,  and  b0, n-1 = 1.

Upon substituting the first relation into the second,
bm, 1 =  −a0 bm-2, n-1 − a1 bm-1, n-1
and now this one into the next bm, 2 = bm-1, 1 − a2 bm-1, n-1
bm, 2 =  −a0 bm-3, n-1 − a1 bm-2, n-1 − a2 bm-1, n-1
...,  and so on, the following difference equation is found.
bm, n-1 =
 −a0 bm-n, n-1 − a1 bm-n+1, n-1 − a2 bm-n+2, n-1  − ... − an-2 bm-2, n-1 − an-1 bm-1, n-1
with   b0, n-1 = b1, n-1 = b2, n-1 = ... = bn-2, n-1 = 0,  and  bn-1, n-1 = 1.

See the subsection on linear difference equations for more explanation.

Chains of generalized eigenvectors[edit]

Some notation and results from previous sections are restated.

  • A is a n × n matrix of complex numbers.
  • Aλ, k = (A − λ I)k
  • Nλ, k = N((A − λ I)k)  = N(Aλ, k)
  • For  V1 V2 = {0},  V1 V2 = {v1 + v2 : v1 ∈ V1 and v2 ∈ V2}.

Assume A has eigenvalues λ1, λ2, ..., λr of algebraic multiplicities k1, k2, ..., kr.

For each i define  α(λi), the null index of λi, to be the smallest positive integer α such that  Nλi, α  =  Nλi, ki.

It is always the case that α(λi) ≤ ki.

When α(λ) ≥ 2,

Nλ, 1 ⊂  Nλ, 2 ⊂ ... ⊂  Nλ,  α-1 ⊂ Nλ, α  = Nλ, α+1 = ...,
 Nλ, α \ {0} = ∪  (Nλ, m \ Nλ, m-1), for m = 1, ..., α and Nλ, 0 = {0}.

x ∈ Nλ, m \ Nλ, m-1,  if and only if  Aλ, 1 x ∈ Nλ, m-1 \ Nλ, m-2

Define a  chain of generalized eigenvectors to be a set
{ x1, x2, ..., xm }  such that  x1 ∈ Nλ, m \ Nλ, m-1,  and  xi+1 = Aλ, 1 xi.

Then xm ≠ 0 and  Aλ, 1 xm = 0.

When  x1 ∈ Nλ, 1 \ {0},  {x1}  can be, for the sake of not requiring extra terminology, considered trivially a chain.

When a disjoint collection of chains combined form a basis set for Nλ, α(λ) ,
they
are often referred to as Jordan chains and are the vectors used for the columns of a transformation matrix in the Jordan canonical form.

When a disjoint collection of chains that combined form a basis set, is needed that satisfy  βi+1xi+1 = Aλ, 1 xi, for some scalars βi, chains
as
already defined can be scaled for this purpose.

What will be proven here is that such a disjoint collection of chains
can always be constructed.

Before the proof is started, recall a few facts about direct sums.
When the notation V1 ⊕ V2 is used, it is assumed V1 V2 = {0}.
For  x = v1 + v2  with v1 ∈ V1 and v2 ∈ V2 ,  then x = 0,
if
and only if  v1 = v2 = 0.

In the discussion below
δi  =  dim(Nλ, i)  −  dim(Nλ, i−1),  with  δ1  =  dim(Nλ, 1).

First consider when  Nλ, 2 \ Nλ, 1 ≠ {0} , Then a basis for  Nλ, 1  can be
extended to a basis for  Nλ, 2.  If  δ2  = 1, then there exists x1 ∈ Nλ, 2 \ Nλ, 1,
such that  Nλ, 2 =  Nλ, 1 ⊕ span{x1}.  Let  x2 = Aλ, 1 x1.  Then
x2 ∈ Nλ, 1 \ {0},  with x1 and x2 linearly independent. If dim(Nλ, 2) = 2,
since
{x1, x2}  is a chain we are through. Otherwise x1, x2 can be extended
to a basis x1, x2, ..., xδ1 for Nλ, 2. The sets {x1, x2}, {x3}, ..., {xδ1}
form a disjoint collection of chains. In the case that δ2  > 1, then there exist
linearly independent  x1, x2, ..., xδ2 ∈ Nλ, 2 \ Nλ, 1,  such that
 Nλ, 2 =  Nλ, 1 ⊕ span{x1, x2, ..., xδ2}.  Let  yi = Aλ, 1 xi.
Then   yi ∈ Nλ, 1 \ {0},  for  i = 1, 2, ..., δ2.  To see the y1, y2, ..., yδ2
are linearly independent, assume that for some  β1, β2, ..., βδ2,
that  β1y1 + β2y2 + ... + βδ2yδ2 = 0, Then for  x = β1x1 + β2x2 + ... + βδ2xδ2,
x ∈ Nλ, 1 ,  and x ∈ span{x1, x2, ..., xδ2}, which implies that x = 0,  and
 β1= β2= ... = βδ2 = 0.  Since span{y1, y2, ..., yδ2} ⊆ Nλ, 1, the vectors
x1, x2, ..., xδ2 , y1, y2, ..., yδ2  are a linearly independent set.
If δ2 = δ1, then the sets {x1, y1}, {x2, y2}, ..., {xδ2, yδ2}  form a
disjoint collection of chains that when combined are a basis set for Nλ, 2.
If δ1 > δ2, then  x1, x2, ..., xδ2 , y1, y2, ..., yδ2  can be extended to a basis
for Nλ, 2 by some vectors xδ2+1, ..., xδ1  in Nλ, 1, so that
{x1, y1}, {x2, y2}, ..., {xδ2, yδ2} , {xδ2+1}, ..., {xδ1}
forms a disjoint collection of chains.

To reduce redundancy, in the next paragraph, when δ = 1 the notation
x1, x2, ..., xδ will be understood simply to mean just x1 and when  δ = 2 
to mean  x1, x2.

So far it has been shown that, if linearly independent
x1, x2, ..., xδ2 ∈ Nλ, 2 \ Nλ, 1,  are chosen, such that
 Nλ, 2 =  Nλ, 1 ⊕ span{x1, x2, ..., xδ2},  then there exists a disjoint
collection of chains with each of the x1, x2, ..., xδ2  being the first member or top
of one of the chains. Furthermore, this collection of vectors, when combined,
forms a basis for  Nλ, 2.

Now, let the induction hypothesis be that, if linearly independent
x1, x2, ..., xδm ∈ Nλ, m \ Nλ, m−1,  are chosen, such that
 Nλ, m =  Nλ, m−1 ⊕ span{x1, x2, ..., xδm},  then there exists a disjoint
collection of chains with each of the x1, x2, ..., xδm  being the first member or top
of one of the chains. Furthermore, this collection of vectors, when combined,
forms a basis for  Nλ, m.

Consider m < α(λ). A basis for  Nλ, m can always be extended to a basis for
 Nλ, m+1. So linearly independent  x1, x2, ..., xδm+1 ∈ Nλ, m+1 \ Nλ, m,  such that
 Nλ, m+1 =  Nλ, m ⊕ span{x1, x2, ..., xδm+1},  can be chosen.  Let  yi = Aλ, 1 xi.
Then   yi ∈ Nλ, m \ Nλ, m−1,  for  i = 1, 2, ..., δm+1.  To see the y1, y2, ..., yδm+1
are linearly independent, assume that for some  β1, β2, ..., βδm+1,
that  β1y1 + β2y2 + ... + βδm+1yδm+1 = 0, Then for
 x = β1x1 + β2x2 + ... + βδm+1xδm+1,   x ∈ Nλ, 1 ,  and
x ∈ span{x1, x2, ..., xδm+1}, which implies that x = 0,  and
 β1= β2= ... = βδm+1 = 0.  In addition,  span{y1, y2, ..., yδm+1} ∩ Nλ, m−1 = {0}.
To see this assume that for some  β1, β2, ..., βδm+1,
that  β1y1 + β2y2 + ... + βδm+1yδm+1 ∈  Nλ, m−1  Then for
 x = β1x1 + β2x2 + ... + βδm+1xδm+1,   x ∈ Nλ, m ,  and
x ∈ span{x1, x2, ..., xδm+1}, which implies that x = 0,  and
 β1= β2= ... = βδm+1 = 0.  The proof is nearly done.

At this point suppose that  b1, b2, ..., bdm−1  is any basis for Nλ, m−1.
Then  B =  span{b1, b2, ..., bdm−1} ⊕ span{y1, y2, ..., yδm+1}
is a subspace of Nλ, m.  If  B ≠ Nλ, m,  then
b1, b2, ..., bdm−1 y1, y2, ..., yδm+1 can be extended to a basis for Nλ, m,
by some set of vectors  z1, z2, ..., zm− δm+1) , in which case
Nλ, m =  Nλ, m−1 ⊕ span{y1, y2, ..., yδm+1} ⊕ span{z1, z2, ..., zm− δm+1)}.

If δm = δm+1, then
 Nλ, m =  Nλ, m−1 ⊕ span{y1, y2, ..., yδm+1}
or if δm > δm+1, then
 Nλ, m =  Nλ, m−1 ⊕ span{z1, z2, ..., zm− δm+1) , y1, y2, ..., yδm+1}
In either case apply the induction hypothesis to get that there exists a disjoint
collection of chains with each of the y1, y2, ..., yδm+1  being the first member or top
of one of the chains. Furthermore, this collection of vectors, when combined,
forms a basis for  Nλ, m. Now,  yi = Aλ, 1 xi,  for  i = 1, 2, ..., δm+1,  so each of the
chains beginning with yi can be extended upwards into  Nλ, m+1 \ Nλ, m to a chain
beginning with xi. Since  Nλ, m+1 =  Nλ, m ⊕ span{x1, x2, ..., xδm+1},
the combined vectors of the new chains form a basis for  Nλ, m+1.

Differential equations y′= Ay[edit]

Let A be a n×n matrix of complex numbers and  λ  an eigenvalue of A , with
associated eigenvector x . Suppose y(t) is a  n dimensional vector valued
function, sufficiently smooth, so that y′(t) is continuous. The restriction that  y(t)
be
smooth can be relaxed somewhat, but is not the main focus of this discussion.

The solutions to the equation  y′(t) = Ay(t)  are sought. The first observation is that
 y(t) = eλtx  will be a solution. When A does not have n linearly independent
eigenvectors, solutions of this kind will not provide the total of n needed for a
fundamental basis set.

In view of the existence of chains of generalized eigenvectors seek a solution of
the form  'y(t) = eλtx1 + t'eλtx2 , then
 y′(t) =' λ eλtx1 + eλtx2 + λ t eλtx2 =' eλt(λ x1 + x2)''''  + t eλt(λx 2) 
and
 Ay(t) = eλtA x1 + t eλtA x2 .

In view of this, y(t) will be a solution to  y′(t) = Ay(t) , when  'A x1 = λ x1 + x2'  and
 A x2 = λ x2 . That is when  (A − λ I)x1 = x2  and  (A − λ I)x2 = 0 . Equivalently,
when  {x1, x2}  is a chain of generalized eigenvectors.

Continuing with this reasoning seek a solution of the form
 y(t) = eλtx1 + t eλtx2 + t2 eλtx3 , then
 y′(t) = λ eλtx1 + eλtx2 + λ t eλtx2 + 2 t eλtx3 + λ t2 eλtx3
= eλt(λ x1 + x2) + t eλt(λ x2 + 2 x3) + t2 eλt(λ x3)'  and
 Ay(t) = eλtA x1 + t eλtA x2 + t2 eλtA x3 .

Like before, y(t) will be a solution to  y′(t) = Ay(t) , when  'A x1 = λ x1 + x2' ,
 'A x2 = λ x2 + 2 x3' , and  A x3 = λ x3 . That is when (A − λ I)x1 = x2 ,
 (A − λ I)x2 = 2 x3 , and  (A − λ I)x3 = 0 . Since it will hold  (A − λ I)(2 x3) = 0 ,
also, equivalently, when  {x1, x2, 2 x3}  is a chain of generalized eigenvectors.

More generally, to find the progression, seek a solution of the form
 y(t) = eλtx1 + t eλtx2 + t2 eλtx3 + t3 eλtx4 + ... + tm−2 eλtxm−1 + tm−1 eλtxm ,
then
 y′(t) = λ eλtx1 + eλtx2 + λ t eλtx2' + 2 t eλtx3' + λ t2 eλtx3' + 3 t2 eλtx4' + λ t3 eλtx4'
+ ...' + (m−2)tm−3eλtxm−1' + λ tm−2 eλtxm−1' + (m−1)tm−2 eλtxm' + λ tm−1 eλtxm
=' eλt(λ x1 + x2) + t eλt(λ x2 + 2 x3) + t2 eλt(λ x3 + 3 x4) + t3 eλt(λ x4 + 4 x5)
+ ...
+ tm−3 eλt(λ xm−2 + (m−2) xm−1) + tm−2 eλt(λ xm−1 + (m−1) xm) + tm−1 eλt(λ xm)' 
and
Ay(t) =
  eλtA x1 + t eλtA x2
+ t2 eλtA x3 + t3 eλtA x4 + ... + tm−2 eλtA xm−1 + tm−1 eλtA xm .

Again, y(t) will be a solution to  y′(t) = Ay(t) , when
 'A x1 = λ x1 + x2' ,  A x2 = λ x2 + 2 x3 ,  A x3 = λ x3 + 3 x4 ,  A x4 = λ x4 + 4 x5 ,
 A xm−2 = λ xm−2 + (m−2) xm−1 ,  A xm−1 = λ xm−1 + (m−1) xm ,
and  A xm = λ xm .
That is when
 (A − λ I)x1 = x2 ,  (A − λ I)x2 = 2 x3 ,  (A − λ I)x3 = 3 x4 ,  (A − λ I)x4 = 4 x5 ,
 ..., 
 (A − λ I)xm−2 = (m−2) xm−1 ,  (A − λ I)xm−1 = (m−1) xm , and
 (A − λ I)xm = 0 .
Since it will hold  (A − λ I)((m−1)! x3) = 0 , also, equivalently, when
 {x1, 1! x2, 2! x3, 3! x4, ..., (m−2)! xm−1, (m−1)! xm} 
is a chain of generalized eigenvectors.

Now, the basis set for all solutions will be found through a disjoint collection
of chains of generalized eigenvectors
of the matrix A.

Assume A has eigenvalues λ1, λ2, ..., λr
of algebraic multiplicities k1, k2, ..., kr.

For a given eigenvalue λi there is a collection of s, with s depending on i,
disjoint chains of generalized eigenvectors
Ci,1 = {1z1, 1z2, ...,1zj1}, Ci,2 = {2z1, 2z2, ...,2zj2}, ..., Ci,js(i) = {sz1, sz2, ...,szjs},
that when combined form a basis set for Nλi, ki. The total number of vectors
in this set will be j1 + j2 + ... + js = ki. Sets in this collection may have only one
or two members so in this discussion understand the notation {βz1, βz2, ...,βz}
will mean {βz1} when jβ = 1, and {βz1, βz2} when jβ = 2, and so forth.

Being that this notation is cumbersome with many indices, in the next paragraphs
any particular Ci,β, when more explanation is not needed, may just be notated as
C = {z1, z2, ..., zj}.

For each such of these chain sets, C = {z1, z2, ..., zj}
the sets {xj}, {xj−1, xj}, {xj−2, xj−1, xj}, ..., {z2, z3, ..., zj}, {z1, z2, ..., zj}
are also chains. This notation being understood to mean when
C = {z1} just {z1}, when C = {z1, z2} just {z2}, {z1, z2} and when
C = {z1, z2, z2} just {z3}, {z2, z3}, {z1, z2, z3}, and so on.

The conclusion of the top of the discussion was that
y(t) = eλtx1, is a solution when {x1} is a chain.
y(t) = eλtx1 + t eλtx2, is a solution when {x1, 1! x2} is a chain.
y(t) = eλtx1 + t eλtx2 + t2 eλtx3, is a solution when {x1, 1! x2, , 2! x3} is a chain.
The progression continues to
y(t) = eλtx1 + t eλtx2 + t2 eλtx3 + t3 eλtx4 + ... + tm−2 eλtxm−1 + tm−1 eλtxm,
is a solution when {x1, 1! x2, 2! x3, 3! x4, ..., (m−2)! xm−1, (m−1)! xm},
is a chain of generalized eigenvectors.

In light of the preceding calculations, all that must be done is to provide the proper
scaling for each of the chains arising from the set C = {z1, z2, ..., zj}.
The progression for the solutions is given by
y(t) = eλtzj, for chain {zj}
y(t) = eλtzj−1 + (1 ⁄ 1!) t eλtzj, for chain {zj−1, 1!(1 ⁄ 1!) zj}
y(t) = eλtzj−2 + (1 ⁄ 1!) t eλtzj−1 + (1 ⁄ 2!) t2 eλtzj,
for chain {zj−2, 1!(1 ⁄ 1!) zj−1, 2!(1 ⁄ 2!) zj}
y(t) = eλtzj−3 + (1 ⁄ 1!) t eλtzj−2 + (1 ⁄ 2!) t2 eλtzj−1 + (1 ⁄ 3!) t3 eλtzj,
for chain {zj−3, 1!(1 ⁄ 1!) zj−2, 2!(1 ⁄ 2!) zj−1, 3!(1 ⁄ 3!) zj},
and so on until,
y(t) = eλtz1 + (1 ⁄ 1!) t eλtz2 + (1 ⁄ 2!) t2 eλtz3 + ... + (1 ⁄ (j−1)!) t j−1 eλtzj,
for the chain of generalized eigenvectors,
{z1, 1!(1 ⁄ 1!) z2, 2!(1 ⁄ 2!) z3, ..., (j−2)!(1 ⁄ (j−2)!) xj−1, (j−1)!(1 ⁄ (j−1)!) zj}.

What is left to show is that when all the solutions constructed from the chain sets,
as described, are considered, they form a fundamental set of solutions.
To do this it has to be shown that there are n of them and that they are
linearly independent.

Reiterating, for a given eigenvalue λi there is a collection of s, with s depending on i,
disjoint chains of generalized eigenvectors
Ci,1 = {1z1, 1z2, ...,1zj1(i)}, Ci,2 = {2z1, 2z2, ...,2zj2(i)},
..., Ci,js(i) = {s(i)z1, s(i)z2, ...,s(i)zjs(i)},
that when combined form a basis set for Nλi, ki. The total number of vectors
in this set will be j1(i) + j2(i) + ... + js(i) = ki.

Thus the total number of all such basis vectors and so solutions is
k1 + k2 + ... + kr = n.

Each solution is one of the forms y(t) = eλtx1, y(t) = eλtx1 + t eλtx2,
y(t) = eλtx1 + t eλtx2 + t2 eλtx3, y(t) = eλtx1 + t eλtx2 + t2 eλtx3 + ....
Now each basis vector vj, for j = 1, 2, ..., n; of the combined set of
generalized eigenvectors, occurs as x1 in one of the expressions immediately
above precisely once. That is, for each j, there is one yj(t) = eλtvj + ...
Since yj(0) = eλ0vj = vj, the set of solutions are linearly independent at t = 0.

Revisiting the powers of a matrix[edit]

As a notational convenience Aλ, 0 = I.

Note that  A  =   λ IAλ, 1 . and apply the binomial theorem.

A^s=(\lambda I+A_{\lambda,1})^s=\sum_{r=0}^s\binom{s}{r}\lambda^{s-r}A_{\lambda,r}

Assume λ is an eigenvalue of A, and let  { x1, x2, ..., xm }
be a  chain of generalized eigenvectors such that  x1 ∈ Nλ, m \ Nλ, m-1 ,
 xi+1 = Aλ, 1 xi ,.  xm ≠ 0 ,  and  Aλ, 1 xm = 0.

Then  xr+1 = Aλ, r x1,   for  r = 0, 1, ..., m-1.

A^s x_1=\sum_{r=0}^s\binom{s}{r}\lambda^{s-r}A_{\lambda,r}x_1=\sum_{r=0}^s\binom{s}{r}\lambda^{s-r}x_{r+1}

So for s ≤ m − 1

A^s x_1=\sum_{r=0}^s\binom{s}{r}\lambda^{s-r}x_{r+1}

and for s ≥ m − 1,  since  Aλ, m x1 = 0,

A^s x_1=\sum_{r=0}^{m-1}\binom{s}{r}\lambda^{s-r}x_{r+1}

Ordinary linear difference equations[edit]

Ordinary linear difference equations are equations of the sort:
yn = a yn−1 +  b
yn = a yn−1 + b yn−2 + c
or more generally,
yn = amyn−1 + am−1yn−2 + ... +  a2yn−m + 1 + a1yn−m + a0
with initial conditions
y0,  y1,  y2,  ...,  ym−2,  ym−1.

A case with a1 = 0 can be excluded, since it represents an equation of less degree.

They have a characteristic polynomial
p(x) =  xm − amxm−1 − am−1xm−2 − ... − a2x − a1.
To solve a difference equation it is first observed, if  yn  and zn are both solutions,
then  (yn − zn)  is a solution of the homogeneous equation:
yn = amyn−1 + am−1yn−2 + ... +  a2yn−m + 1 + a1yn−m.

So a particular solution to the difference equation must be found together with
all solutions of the homogeneous equation to get the general solution for the
difference equation. Another observation to make is that, if yn is a solution to
the inhomogeneous equation, then
zn = yn+1 − yn
is also a solution to the homogeneous equation.
So all solutions of the homogeneous equation will be found first.

When β is a root of p(x) = 0, then it is easily seen
yn = βn  is a solution to the homogeneous equation since
yn − amyn−1 − am−1yn−2 − ... −  a2yn−m + 1 − a1yn−m,
becomes upon the substitution yn = βn,
βn − amβn−1 − am−1βn−2 − ... − a2βn−m + 1 − a1βn−m
= βn−mm − amβm−1 − am−1βm−2 − ... − a2β − a1)
 = βn−mp(β) = 0.

When β is a repeated root of p(x) = 0, then
yn = nβn−1  is a solution to the homogeneous equation since
n−1 − am(n−1)βn−2 − am−1(n−2)βn−3 − ... − a2(n−m + 1)βn−m − a1(n−m)βn−m − 1
= (n−m)βn−m − 1m − amβm−1 − am−1βm−2 − ... − a2β − a1)
+ βn−m − 1(mβm−1 − (m−1)amβm−2 − (m−2)am−1βm−3 − ... − 2a3β − a2)
= (n−m)βn−m − 1p(β) + βn−m − 1p′(β) == 0.

After reaching this point in the calculation the mystery is solved. Just notice when
β  is a root of p(x) = 0 with multiplicity  k,  then for s = 1, 2, ..., k−1
dsn−mp(β))/dβs = 0.
Referring this back to the original equation
βn − amβn−1 − am−1βn−2 − ... − a2βn−m + 1 − a1βn−m
it is seen that
yn = dsn)/dβs
are solutions to the homogeneous equation. For example, if β is a root of
multiplicity 3, then yn = n(n−1)βn−2  is a solution. In any case this gives  m
linearly independent solutions to the homogeneous equation.

To look for a particular solution first consider the simpliest equation.
yn = a yn−1 +  b.
It has a particular solution yp,n given by
yp,0 = 0, yp,1 = b, yp,2 = (1 + a)b, ..., yp,n = (1 + a + a2 + ... + an−1)b, ..., .
It's homogeneous equation yn = a yn−1 has solutions yn = any0.
So zn = yn+1 − yn = anb
can be telescoped to get
yn = (yn − yn−1) + (yn−1 − yn−2) + ... + (y2 − y1) + (y1 − y0) + y0
= zn−1 + zn−2 + ... + z1 + z0 + y0
= (1 + a + a2 + ... + an−1)b ,
the particular solution with y0 = 0.

Now, returning to the general problem, the equation
yn = amyn−1 + am−1yn−2 + ... +  a2yn−m + 1 + a1yn−m + a0.
When yp,n is a particular solution with yp,0 = 0,  then
 zn = yp,n+1 − yp,n 
is a solution to the homogeneous equation with  z0 = yp,1 .
So zn = yp,n+1 − yp,n
can be telescoped to get
yp,n = (yp,n − yp,n−1) + (yp,n−1 − yp,n−2) + ... + (yp,2 − yp,1) + (yp,1 − yp,0) + yp,0
= zn−1 + zn−2 + ... + z1 + z0 
Considering
yp,m = amyp,m−1 + am−1yp,m−2 + ... +  a2yp,1 + a1yp,0 + a0.
and rewriting the equation in the  zi
zm−1 + zm−2 + ... + z1 + z0 
 =  (am) ( zm−2 + zm−3 + ... + z1 + z0)   +  (am−1) ( zm−3 + zm−4 + ... + z1 + z0
 +  (am−2) ( zm−4 + zm−5 + ... + z1 + z0
 +  · · ·
 +  (a3) ( z1 + z0)  +  (a2) ( z0)  +  (a0)
and
zm−1
 =  (am − 1) zm−2  +  (am + am−1 − 1) zm−3  +  (am + am−1 + am−2 − 1) zm−4
 +  · · ·
 +  (am + am−1 + ... + a4 + a3 − 1) z1  +  (am + am−1 + ... + a3 + a2 − 1) z0
 +  (a0).

Since a solution of the homogeneous equation can be found for any initial conditions
z0,  z1,  z2,  ...,  zm−2,  zm−1.
reasoning conversely find such zi satisfying the equation,
just before and define yp,n by the relation
 yp,0 = 0,  yp,n  = zn−1 + zn−2 + ... + z1 + z0 

One choice is, for example,  zm−1 = a0,   z0  =  z1  =  z2  = ... =  zm−2  =  0.
This solution solves the problem for all initial values equal to zero.

The general solution to the inhomogeneous equation is given by
yn = yp,n + γ1 w(1)n + γ2 w(2)n + ... + γm−1 w(m−1)n + γm w(m)n
where
w(1)n,  w(2)n,  ...,  w(m−1)n, w(m)n
are a basis for the homogeneous equation, and
γ1,  γ2,  ...,  γm−1,  γm
are scalars.

example

yn = 8 yn−1 − 25 yn−2 + 38 yn−3 − 28 yn−4  + 8 yn−5 +  1
with initial conditions
y0 = 0,  y1 = 0,  y2 = 0,   y3 = 0,  and  y4 = 0.

The characteristic polynomial for the equation is
p(x) =  x5 − 8x4 + 25x3 − 38x2 + 28x − 8   =  (x − 1)2(x − 2)3.

The homogeneous equation has independent solutions
w1n  =  1n = 1,   w2n  =  n·1n−1 = n,   and
w3n  =  2n,   w4n  =  n·2n−1,   w5n  =  n(n−1)·2n−2.
The solution to the homogeneous equation
zn = −3 w1n − w2n +  3 w3n − 2 w4n + ½ w5n  
satisfies the initial conditions
z4 = 1,   z0  =  z1  =  z2  =  z3  =  0.
A particular solution can be found by
yp,0 = 0,   yp,n  = zn−1 + zn−2 + ... + z1 + z0 .

Calculating sums:
∑w1  =  w1n−1 + w1n−2 + ... + w11 + w10   =  n .
∑w2  =  w2n−1 + w2n−2 + ... + w21 + w20   =  (n−1)n / 2 .
∑w3  =  w3n−1 + w3n−2 + ... + w31 + w30   =  2n − 1 .
Sums of these kinds are found by differentiating (xn − 1) / (x − 1).
∑w4  =  w4n−1 + w4n−2 + ... + w41 + w40   =  (n−2)2n−1 + 1 .
∑w5  =  w5n−1 + w5n−2 + ... + w51 + w50   =  (n2 − 5n + 8)2n−2 − 2 .

Now,
yp,n = −3 ∑w1n − ∑w2n +  3 ∑w3n − 2 ∑w4n + ½ ∑w5n
solves the initial value problem of this example.

At this point it is worthwhile to notice that all the terms that are combinations of
scalar multiples of basis elements can be removed. These are any multiples of
1,  n,  2n,  n·2n−1,  and  n2·2n−2.
So instead the particular solution next, may be preferred.
yp,n  =  −½ n2 .
This solution has non zero initial values, which must be taken into account.
y0 = 0,  y1 = −1 ⁄ 2,  y2 = −2,   y3 = −9 ⁄ 2,  and  y4 = −8.

References[edit]