Centering matrix

In mathematics and multivariate statistics, the centering matrix^[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.

Definition[edit]

The centering matrix of size n is defined as the n-by-n matrix

C_{n}=I_{n}-{\tfrac {1}{n}}J_{n}

where $I_{n}\,$ is the identity matrix of size n and $J_{n}$ is an n-by-n matrix of all 1's.

For example

C_{1}={\begin{bmatrix}0\end{bmatrix}}

,

C_{2}=\left[{\begin{array}{rrr}1&0\\0&1\end{array}}\right]-{\frac {1}{2}}\left[{\begin{array}{rrr}1&1\\1&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {1}{2}}&-{\frac {1}{2}}\\-{\frac {1}{2}}&{\frac {1}{2}}\end{array}}\right]

,

C_{3}=\left[{\begin{array}{rrr}1&0&0\\0&1&0\\0&0&1\end{array}}\right]-{\frac {1}{3}}\left[{\begin{array}{rrr}1&1&1\\1&1&1\\1&1&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {2}{3}}&-{\frac {1}{3}}&-{\frac {1}{3}}\\-{\frac {1}{3}}&{\frac {2}{3}}&-{\frac {1}{3}}\\-{\frac {1}{3}}&-{\frac {1}{3}}&{\frac {2}{3}}\end{array}}\right]

Properties[edit]

Given a column-vector, $\mathbf {v} \,$ of size n, the centering property of $C_{n}\,$ can be expressed as

C_{n}\,\mathbf {v} =\mathbf {v} -({\tfrac {1}{n}}J_{n,1}^{\textrm {T}}\mathbf {v} )J_{n,1}

where $J_{n,1}$ is a column vector of ones and ${\tfrac {1}{n}}J_{n,1}^{\textrm {T}}\mathbf {v}$ is the mean of the components of $\mathbf {v} \,$ .

$C_{n}\,$ is symmetric positive semi-definite.

$C_{n}\,$ is idempotent, so that $C_{n}^{k}=C_{n}$ , for $k=1,2,\ldots$ . Once the mean has been removed, it is zero and removing it again has no effect.

$C_{n}\,$ is singular. The effects of applying the transformation $C_{n}\,\mathbf {v}$ cannot be reversed.

$C_{n}\,$ has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.

$C_{n}\,$ has a nullspace of dimension 1, along the vector $J_{n,1}$ .

$C_{n}\,$ is an orthogonal projection matrix. That is, $C_{n}\mathbf {v}$ is a projection of $\mathbf {v} \,$ onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace $J_{n,1}$ . (This is the subspace of all n-vectors whose components sum to zero.)

The trace of $C_{n}$ is $n(n-1)/n=n-1$ .

Application[edit]

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an m-by-n matrix $X$ .

The left multiplication by $C_{m}$ subtracts a corresponding mean value from each of the n columns, so that each column of the product $C_{m}\,X$ has a zero mean. Similarly, the multiplication by $C_{n}$ on the right subtracts a corresponding mean value from each of the m rows, and each row of the product $X\,C_{n}$ has a zero mean. The multiplication on both sides creates a doubly centred matrix $C_{m}\,X\,C_{n}$ , whose row and column means are equal to zero.

The centering matrix provides in particular a succinct way to express the scatter matrix, $S=(X-\mu J_{n,1}^{\mathrm {T} })(X-\mu J_{n,1}^{\mathrm {T} })^{\mathrm {T} }$ of a data sample $X\,$ , where $\mu ={\tfrac {1}{n}}XJ_{n,1}$ is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

S=X\,C_{n}(X\,C_{n})^{\mathrm {T} }=X\,C_{n}\,C_{n}\,X\,^{\mathrm {T} }=X\,C_{n}\,X\,^{\mathrm {T} }.

$C_{n}$ is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are $k=n$ , and $p_{1}=p_{2}=\cdots =p_{n}={\frac {1}{n}}$ .

References[edit]

^ John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.

[1] John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.

[1]

v t e Matrix classes
Explicitly constrained entries	Alternant Anti-diagonal Anti-Hermitian Anti-symmetric Arrowhead Band Bidiagonal Bisymmetric Block-diagonal Block Block tridiagonal Boolean Cauchy Centrosymmetric Conference Complex Hadamard Copositive Diagonally dominant Diagonal Discrete Fourier Transform Elementary Equivalent Frobenius Generalized permutation Hadamard Hankel Hermitian Hessenberg Hollow Integer Logical Matrix unit Metzler Moore Nonnegative Pentadiagonal Permutation Persymmetric Polynomial Quaternionic Signature Skew-Hermitian Skew-symmetric Skyline Sparse Sylvester Symmetric Toeplitz Triangular Tridiagonal Vandermonde Walsh Z
Constant	Exchange Hilbert Identity Lehmer Of ones Pascal Pauli Redheffer Shift Zero
Conditions on eigenvalues or eigenvectors	Companion Convergent Defective Definite Diagonalizable Hurwitz Positive-definite Stieltjes
Satisfying conditions on products or inverses	Congruent Idempotent or Projection Invertible Involutory Nilpotent Normal Orthogonal Unimodular Unipotent Unitary Totally unimodular Weighing
With specific applications	Adjugate Alternating sign Augmented Bézout Carleman Cartan Circulant Cofactor Commutation Confusion Coxeter Distance Duplication and elimination Euclidean distance Fundamental (linear differential equation) Generator Gram Hessian Householder Jacobian Moment Payoff Pick Random Rotation Seifert Shear Similarity Symplectic Totally positive Transformation
Used in statistics	Centering Correlation Covariance Design Doubly stochastic Fisher information Hat Precision Stochastic Transition
Used in graph theory	Adjacency Biadjacency Degree Edmonds Incidence Laplacian Seidel adjacency Tutte
Used in science and engineering	Cabibbo–Kobayashi–Maskawa Density Fundamental (computer vision) Fuzzy associative Gamma Gell-Mann Hamiltonian Irregular Overlap S State transition Substitution Z (chemistry)
Related terms	Jordan normal form Linear independence Matrix exponential Matrix representation of conic sections Perfect matrix Pseudoinverse Row echelon form Wronskian
Mathematics portal List of matrices Category:Matrices