In mathematics and multivariate statistics, the centering matrix is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.
The centering matrix of size n is defined as the n-by-n matrix
where is the identity matrix of size n and is an n-by-n matrix of all 1's. This can also be written as:
where is the column-vector of n ones and where denotes matrix transpose.
Given a column-vector, of size n, the centering property of can be expressed as
where is the mean of the components of .
is symmetric positive semi-definite.
is idempotent, so that , for . Once the mean has been removed, it is zero and removing it again has no effect.
is singular. The effects of applying the transformation cannot be reversed.
has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.
has a nullspace of dimension 1, along the vector .
Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix , the multiplication removes the means from each of the n columns, while removes the means from each of the m rows.
The centering matrix provides in particular a succinct way to express the scatter matrix, of a data sample , where is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as
- John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.