# Scatter matrix

For the notion in quantum mechanics, see scattering matrix.

In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix, for instance of the multivariate normal distribution.

## Definition

Given n samples of m-dimensional data, represented as the m-by-n matrix, ${\displaystyle X=[\mathbf {x} _{1},\mathbf {x} _{2},\ldots ,\mathbf {x} _{n}]}$, the sample mean is

${\displaystyle {\overline {\mathbf {x} }}={\frac {1}{n}}\sum _{j=1}^{n}\mathbf {x} _{j}}$

where ${\displaystyle \mathbf {x} _{j}}$ is the j-th column of ${\displaystyle X}$.

The scatter matrix is the m-by-m positive semi-definite matrix

${\displaystyle S=\sum _{j=1}^{n}(\mathbf {x} _{j}-{\overline {\mathbf {x} }})(\mathbf {x} _{j}-{\overline {\mathbf {x} }})^{T}=\sum _{j=1}^{n}(\mathbf {x} _{j}-{\overline {\mathbf {x} }})\otimes (\mathbf {x} _{j}-{\overline {\mathbf {x} }})=\left(\sum _{j=1}^{n}\mathbf {x} _{j}\mathbf {x} _{j}^{T}\right)-n{\overline {\mathbf {x} }}{\overline {\mathbf {x} }}^{T}}$

where ${\displaystyle T}$ denotes matrix transpose, and multiplication is with regards to the outer product. The scatter matrix may be expressed more succinctly as

${\displaystyle S=X\,C_{n}\,X^{T}}$

where ${\displaystyle \,C_{n}}$ is the n-by-n centering matrix.

## Application

The maximum likelihood estimate, given n samples, for the covariance matrix of a multivariate normal distribution can be expressed as the normalized scatter matrix

${\displaystyle C_{ML}={\frac {1}{n}}S.}$

When the columns of ${\displaystyle X}$ are independently sampled from a multivariate normal distribution, then ${\displaystyle S}$ has a Wishart distribution.