# Sylvester equation

In mathematics, in the field of control theory, a Sylvester equation is a matrix equation of the form:[1]

${\displaystyle AX+XB=C.}$

Then given matrices A, B, and C, the problem is to find the possible matrices X that obey this equation. All matrices are assumed to have coefficients in the complex numbers. For the equation to make sense, the matrices must have appropriate sizes, for example they could all be square matrices of the same size. But more generally, A and B must be square matrices of sizes n and m respectively, and then X and C both have n rows and m columns.

A Sylvester equation has a unique solution for X exactly when there are no common eigenvalues of A and −B. More generally, the equation AX + XB = C has been considered as an equation of bounded operators on a (possibly infinite-dimensional) Banach space. In this case, the condition for the uniqueness of a solution X is almost the same: There exists a unique solution X exactly when the spectra of A and −B are disjoint.[2]

## Existence and uniqueness of the solutions

Using the Kronecker product notation and the vectorization operator ${\displaystyle \operatorname {vec} }$, we can rewrite Sylvester's equation in the form

${\displaystyle (I_{m}\otimes A+B^{T}\otimes I_{n})\operatorname {vec} X=\operatorname {vec} C,}$

where ${\displaystyle A}$ is of dimension ${\displaystyle n\!\times \!n}$, ${\displaystyle B}$ is of dimension ${\displaystyle m\!\times \!m}$, ${\displaystyle X}$ of dimension ${\displaystyle n\!\times \!m}$ and ${\displaystyle I_{k}}$ is the ${\displaystyle k\times k}$ identity matrix. In this form, the equation can be seen as a linear system of dimension ${\displaystyle mn\times mn}$.[3]

Theorem. Given matrices ${\displaystyle A\in \mathbb {C} ^{n\times n}}$ and ${\displaystyle B\in \mathbb {C} ^{m\times m}}$, the Sylvester equation ${\displaystyle AX+XB=C}$ has a unique solution ${\displaystyle X\in \mathbb {C} ^{n\times m}}$ for any ${\displaystyle C\in \mathbb {C} ^{n\times m}}$ if and only if ${\displaystyle A}$ and ${\displaystyle -B}$ do not share any eigenvalue.

Proof. The equation ${\displaystyle AX+XB=C}$ is a linear system with ${\displaystyle mn}$ unknowns and the same amount of equations. Hence it is uniquely solvable for any given ${\displaystyle C}$ if and only if the homogeneous equation ${\displaystyle AX+XB=0}$ admits only the trivial solution ${\displaystyle 0}$.

(i) Assume that ${\displaystyle A}$ and ${\displaystyle -B}$ do not share any eigenvalue. Let ${\displaystyle X}$ be a solution to the abovementioned homogeneous equation. Then ${\displaystyle AX=X(-B)}$, which can be lifted to ${\displaystyle A^{k}X=X(-B)^{k}}$ for each ${\displaystyle k\geq 0}$ by mathematical induction. Consequently, ${\displaystyle p(A)X=Xp(-B)}$ for any polynomial ${\displaystyle p}$. In particular, let ${\displaystyle p}$ be the characteristic polynomial of ${\displaystyle A}$. Then ${\displaystyle p(A)=0}$ due to the Cayley-Hamilton theorem; meanwhile, the spectral mapping theorem tells us ${\displaystyle \sigma (p(-B))=p(\sigma (-B)),}$ where ${\displaystyle \sigma (\cdot )}$ denotes the spectrum of a matrix. Since ${\displaystyle A}$ and ${\displaystyle -B}$ do not share any eigenvalue, ${\displaystyle p(\sigma (-B))}$ does not contain zero, and hence ${\displaystyle p(-B)}$ is nonsingular. Thus ${\displaystyle X=0}$ as desired. This proves the "if" part of the theorem.

(ii) Now assume that ${\displaystyle A}$ and ${\displaystyle -B}$ share an eigenvalue ${\displaystyle \lambda }$. Let ${\displaystyle u}$ be a corresponding right eigenvector for ${\displaystyle A}$, ${\displaystyle v}$ be a corresponding left eigenvector for ${\displaystyle -B}$, and ${\displaystyle X=u{v}^{*}}$. Then ${\displaystyle X\neq 0}$, and ${\displaystyle AX+XB=A(uv^{*})-(uv^{*})(-B)=\lambda uv^{*}-\lambda uv^{*}=0.}$ Hence ${\displaystyle X}$ is a nontrivial solution to the aforesaid homogeneous equation, justifying the "only if" part of the theorem. Q.E.D.

As an alternative to the spectral mapping theorem, the nonsigularity of ${\displaystyle p(-B)}$ in part (i) of the proof can also be demonstrated by the Bézout's identity for coprime polynomials. Let ${\displaystyle q}$ be the characteristic polynomial of ${\displaystyle -B}$. Since ${\displaystyle A}$ and ${\displaystyle -B}$ do not share any eigenvalue, ${\displaystyle p}$ and ${\displaystyle q}$ are coprime. Hence there exist polynomials ${\displaystyle f}$ and ${\displaystyle g}$ such that ${\displaystyle p(z)f(z)+q(z)g(z)\equiv 1}$. By the Cayley–Hamilton theorem, ${\displaystyle q(-B)=0}$. Thus ${\displaystyle p(-B)f(-B)=I}$, implying that ${\displaystyle p(-B)}$ is nonsigular.

The theorem remains true for real matrices with the caveat that one considers their complex eigenvalues. The proof for the "if" part is still applicable; for the "only if" part, note that both ${\displaystyle \mathrm {Re} (uv^{*})}$ and ${\displaystyle \mathrm {Im} (uv^{*})}$ satisfy the homogenous equation ${\displaystyle AX+XB=0}$, and they cannot be zero simultaneously.

## Roth's removal rule

Given two square complex matrices A and B, of size n and m, and a matrix C of size n by m, then one can ask when the following two square matrices of size n + m are similar to each other: ${\displaystyle {\begin{bmatrix}A&C\\0&B\end{bmatrix}}}$ and ${\displaystyle {\begin{bmatrix}A&0\\0&B\end{bmatrix}}}$. The answer is that these two matrices are similar exactly when there exists a matrix X such that AX − XB = C. In other words, X is a solution to a Sylvester equation. This is known as Roth's removal rule.[4]

One easily checks one direction: If AX − XB = C then

${\displaystyle {\begin{bmatrix}I_{n}&X\\0&I_{m}\end{bmatrix}}{\begin{bmatrix}A&C\\0&B\end{bmatrix}}{\begin{bmatrix}I_{n}&-X\\0&I_{m}\end{bmatrix}}={\begin{bmatrix}A&0\\0&B\end{bmatrix}}.}$

Roth's removal rule does not generalize to infinite-dimensional bounded operators on a Banach space.[5]

## Numerical solutions

A classical algorithm for the numerical solution of the Sylvester equation is the Bartels–Stewart algorithm, which consists of transforming ${\displaystyle A}$ and ${\displaystyle B}$ into Schur form by a QR algorithm, and then solving the resulting triangular system via back-substitution. This algorithm, whose computational cost is ${\displaystyle {\mathcal {O}}(n^{3})}$ arithmetical operations,[citation needed] is used, among others, by LAPACK and the lyap function in GNU Octave.[6] See also the sylvester function in that language.[7][8] In some specific image processing application, the derived Sylvester equation has a closed form solution.[9]

## Notes

1. ^ This equation is also commonly written in the equivalent form of AX − XB = C.
2. ^ Bhatia and Rosenthal, 1997
3. ^ However, rewriting the equation in this form is not advised for the numerical solution since this version is costly to solve and can be ill-conditioned.
4. ^ Gerrish, F; Ward, A.G.B (Nov 1998). "Sylvester's matrix equation and Roth's removal rule". The Mathematical Gazette. 82 (495): 423–430. doi:10.2307/3619888. JSTOR 3619888.
5. ^ Bhatia and Rosenthal, p.3
6. ^
7. ^
8. ^ The syl command is deprecated since GNU Octave Version 4.0
9. ^ Wei, Q.; Dobigeon, N.; Tourneret, J.-Y. (2015). "Fast Fusion of Multi-Band Images Based on Solving a Sylvester Equation". IEEE. 24 (11): 4109–4121. arXiv:1502.03121. Bibcode:2015ITIP...24.4109W. doi:10.1109/TIP.2015.2458572. PMID 26208345.