Block matrix pseudoinverse

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In mathematics, block matrix pseudoinverse is a formula of pseudoinverse of a partitioned matrix. This is useful for decomposing or approximating many algorithms updating parameters in signal processing, which are based on least squares method.

Derivation[edit]

Consider a column-wise partitioned matrix:

 [\mathbf A, \mathbf B], \qquad \mathbf A \in \reals^{m\times n}, \qquad \mathbf B \in \reals^{m\times p}, \qquad m \geq n+p.

If the above matrix is full rank, the pseudoinverse matrices of it and its transpose are as follows.

 
\begin{bmatrix}
\mathbf A,  & \mathbf B
\end{bmatrix}
^{+} = ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1} [\mathbf A, \mathbf B]^T,
 
\begin{bmatrix}
\mathbf A^T  \\ \mathbf B^T 
\end{bmatrix}
^{+} = [\mathbf A, \mathbf B] ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1}.

The pseudoinverse requires (n + p)-square matrix inversion.

To reduce complexity and introduce parallelism, we derive the following decomposed formula. From a block matrix inverse \mathbf ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1}, we can have[citation needed]

  
\begin{bmatrix}
\mathbf A,  & \mathbf B 
\end{bmatrix}
^{+} = \left[\mathbf P_B^\perp \mathbf A( \mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}, \quad \mathbf P_A^\perp \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}\right]^T,
  
\begin{bmatrix}
\mathbf A^T  \\ \mathbf B^T 
\end{bmatrix}
^{+} = \left[\mathbf P_B^\perp \mathbf A( \mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}, \quad \mathbf P_A^\perp \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}\right],

where orthogonal projection matrices are defined by


\begin{align}
\mathbf P_A^\perp & = \mathbf I - \mathbf A (\mathbf A^T \mathbf A)^{-1} \mathbf A^T, \\ \mathbf P_B^\perp & = \mathbf I - \mathbf B (\mathbf B^T \mathbf B)^{-1} \mathbf B^T.
\end{align}

Interestingly, from the idempotence of projection matrix, we can verify that the pseudoinverse of block matrix consists of pseudoinverse of projected matrices:

  
\begin{bmatrix}
\mathbf A,  & \mathbf B
\end{bmatrix}
^{+} 
= 
\begin{bmatrix}
(\mathbf P_B^{\perp}\mathbf A)^{+}
\\ 
(\mathbf P_A^{\perp}\mathbf B)^{+} 
\end{bmatrix},
  
\begin{bmatrix}
\mathbf A^T  \\ \mathbf B^T 
\end{bmatrix}
^{+} 
= [(\mathbf A^T \mathbf P_B^{\perp})^{+}, 
\quad (\mathbf B^T \mathbf P_A^{\perp})^{+} ].

Thus, we decomposed the block matrix pseudoinverse into two submatrix pseudoinverses, which cost n- and p-square matrix inversions, respectively.

Note that the above formulae are not necessarily valid if [\mathbf A, \mathbf B] does not have full rank – for example, if \mathbf A \neq 0, then


\begin{bmatrix}
\mathbf A,  & \mathbf A
\end{bmatrix}
^{+} 
= \frac{1}{2}
\begin{bmatrix}
\mathbf A^{+}  \\ \mathbf A^{+} 
\end{bmatrix}
\neq
\begin{bmatrix}
(\mathbf P_A^{\perp}\mathbf A)^{+}
\\ 
(\mathbf P_A^{\perp}\mathbf A)^{+} 
\end{bmatrix}
= 0

Application to least squares problems[edit]

Given the same matrices as above, we consider the following least squares problems, which appear as multiple objective optimizations or constrained problems in signal processing. Eventually, we can implement a parallel algorithm for least squares based on the following results.

Column-wise partitioning in over-determined least squares[edit]

Suppose a solution  \mathbf x = \begin{bmatrix}
\mathbf x_1 \\
\mathbf x_2 \\
\end{bmatrix}
solves an over-determined system:

 
\begin{bmatrix}
\mathbf A, & \mathbf B 
\end{bmatrix}
\begin{bmatrix}
\mathbf x_1 \\
\mathbf x_2 \\
\end{bmatrix}
= 
\mathbf d
, 
\qquad \mathbf d \in \reals^{m\times 1}.

Using the block matrix pseudoinverse, we have


\mathbf x
= 
\begin{bmatrix}
\mathbf A, & \mathbf B
\end{bmatrix}
^{+}\,
\mathbf d
= 
\begin{bmatrix}
(\mathbf P_B^{\perp} \mathbf A)^{+}\\
(\mathbf P_A^{\perp} \mathbf B)^{+} 
\end{bmatrix}
\mathbf d
.

Therefore, we have a decomposed solution:


\mathbf x_1
= 
(\mathbf P_B^{\perp} \mathbf A)^{+}\,
\mathbf d
,
\qquad
\mathbf x_2
= 
(\mathbf P_A^{\perp} \mathbf B)^{+} 
\,
\mathbf d
.

Row-wise partitioning in under-determined least squares[edit]

Suppose a solution  \mathbf x solves an under-determined system:

 
\begin{bmatrix}
\mathbf A^T  \\ \mathbf B^T 
\end{bmatrix}
\mathbf x 
= 
\begin{bmatrix}
\mathbf e \\ \mathbf f 
\end{bmatrix}, 
\qquad \mathbf e \in \reals^{n\times 1},
\qquad \mathbf f \in \reals^{p\times 1}.

The minimum-norm solution is given by


\mathbf x
= 
\begin{bmatrix}
\mathbf A^T  \\ \mathbf B^T 
\end{bmatrix}
^{+}\,
\begin{bmatrix}
\mathbf e \\ \mathbf f 
\end{bmatrix}.

Using the block matrix pseudoinverse, we have


\mathbf x
= 
[(\mathbf A^T\mathbf P_B^{\perp})^{+}, 
\quad (\mathbf B^T\mathbf P_A^{\perp})^{+} ]
\begin{bmatrix}
\mathbf e \\ \mathbf f 
\end{bmatrix}
= 
(\mathbf A^T\mathbf P_B^{\perp})^{+}\,\mathbf e
+
(\mathbf B^T\mathbf P_A^{\perp} )^{+}\,\mathbf f 
.

Comments on matrix inversion[edit]

Instead of  \mathbf ([\mathbf A, \mathbf B]^T [\mathbf A, \mathbf B])^{-1}, we need to calculate directly or indirectly[citation needed][original research?]

 
\quad   (\mathbf A^T \mathbf A)^{-1},
\quad   (\mathbf B^T \mathbf B)^{-1},
\quad   (\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}, 
\quad   (\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}
.

In a dense and small system, we can use singular value decomposition, QR decomposition, or Cholesky decomposition to replace the matrix inversions with numerical routines. In a large system, we may employ iterative methods such as Krylov subspace methods.

Considering parallel algorithms, we can compute (\mathbf A^T \mathbf A)^{-1} and (\mathbf B^T \mathbf B)^{-1} in parallel. Then, we finish to compute (\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1} and (\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1} also in parallel.

Block matrix inversion[edit]

Let a block matrix be

\begin{bmatrix}
A & B \\
C & D
\end{bmatrix}
.

We can get an inverse formula by combining the previous results in.[1]

\begin{bmatrix}
A & B \\
C & D
\end{bmatrix}^{-1}
=
\begin{bmatrix}
                 (A - BD^{-1}C)^{-1}         & -A^{-1}B(D - CA^{-1}B)^{-1} \\
                 -D^{-1}C(A - BD^{-1}C)^{-1} & (D - CA^{-1}B)^{-1}  
\end{bmatrix}
=
\begin{bmatrix}
                 S^{-1}_D         & -A^{-1}BS^{-1}_A \\
                 -D^{-1}CS^{-1}_D & S^{-1}_A
\end{bmatrix}
,

where S_A and S_D, respectively, Schur complements of A and D, are defined by S_A = D - C A^{-1}B, and S_D =
A - BD^{-1}C. This relation is derived by using Block Triangular Decomposition. It is called simple block matrix inversion.[2]

Now we can obtain the inverse of the symmetric block matrix:


\begin{bmatrix}
\mathbf A^T \mathbf A & \mathbf A^T \mathbf B \\
\mathbf B^T \mathbf A & \mathbf B^T \mathbf B
\end{bmatrix}^{-1}
=
\begin{bmatrix}
                 (\mathbf A^T \mathbf A-\mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A)^{-1}         
                 & -(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B(\mathbf B^T \mathbf B-\mathbf B^T \mathbf A(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B)^{-1} 
\\
                 -(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A(\mathbf A^T \mathbf A-\mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A)^{-1} 
                 & (\mathbf B^T \mathbf B-\mathbf B^T \mathbf A(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B)^{-1}  
\end{bmatrix}

=
\begin{bmatrix}
                 (\mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}         
                 & -(\mathbf A^T \mathbf A)^{-1}\mathbf A^T \mathbf B(\mathbf B^T \mathbf P_A^\perp \mathbf B)^{-1}
\\
                 -(\mathbf B^T \mathbf B)^{-1}\mathbf B^T \mathbf A(\mathbf A^T \mathbf P_B^\perp \mathbf A)^{-1}
                 & (\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}
\end{bmatrix}

Since the block matrix is symmetric, we also have


\begin{bmatrix}
\mathbf A^T \mathbf A & \mathbf A^T \mathbf B \\
\mathbf B^T \mathbf A & \mathbf B^T \mathbf B
\end{bmatrix}^{-1}
=
\begin{bmatrix}
                 (\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}         
                 & 
                 -(\mathbf A^T \mathbf P_B^{\perp} \mathbf A)^{-1}
                  \mathbf A^T \mathbf B(\mathbf B^T \mathbf B)^{-1}
\\
                  -(\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}
                   \mathbf B^T \mathbf A (\mathbf A^T \mathbf A)^{-1}
                 & (\mathbf B^T \mathbf P_A^{\perp} \mathbf B)^{-1}
\end{bmatrix}.

Then, we can see how the Schur complements are connected to the projection matrices of the symmetric, partitioned matrix.

See also[edit]

References[edit]

External links[edit]