# Whitening transformation

A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they are uncorrelated and each have variance 1.[1] The transformation is called "whitening" because it changes the input vector into a white noise vector.

Several other transformations are closely related to whitening:

1. the decorrelation transform removes only the correlations but leaves variances intact,
2. the standardization transform sets variances to 1 but leaves correlations intact,
3. a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.[2]

## Definition

Suppose ${\displaystyle X}$ is a random (column) vector with non-singular covariance matrix ${\displaystyle \Sigma }$ and mean ${\displaystyle 0}$. Then the transformation ${\displaystyle Y=WX}$ with a whitening matrix ${\displaystyle W}$ satisfying the condition ${\displaystyle W^{\mathrm {T} }W=\Sigma ^{-1}}$ yields the whitened random vector ${\displaystyle Y}$ with unit diagonal covariance.

There are infinitely many possible whitening matrices ${\displaystyle W}$ that all satisfy the above condition. Commonly used choices are ${\displaystyle W=\Sigma ^{-1/2}}$ (Mahalanobis or ZCA whitening), ${\displaystyle W=L^{T}}$ where ${\displaystyle L}$ is the Cholesky decomposition of ${\displaystyle \Sigma ^{-1}}$ (Cholesky whitening),[3] or the eigen-system of ${\displaystyle \Sigma }$ (PCA whitening).[4]

Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of ${\displaystyle X}$ and ${\displaystyle Y}$.[3] For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original ${\displaystyle X}$ and whitened ${\displaystyle Y}$ is produced by the whitening matrix ${\displaystyle W=P^{-1/2}V^{-1/2}}$ where ${\displaystyle P}$ is the correlation matrix and ${\displaystyle V}$ the variance matrix.

## Whitening a data matrix

Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by estimating the covariance (e.g. by maximum likelihood) and subsequently constructing a corresponding estimated whitening matrix (e.g. by Cholesky decomposition).

## R implementation

An implementation of several whitening procedures in R, including ZCA-whitening and PCA whitening but also CCA whitening, is available in the "whitening" R package [5] published on CRAN.

## References

1. ^ Koivunen, A.C.; Kostinski, A.B. (1999). "The Feasibility of Data Whitening to Improve Performance of Weather Radar". Journal of Applied Meteorology. 38 (6): 741–749. Bibcode:1999JApMe..38..741K. doi:10.1175/1520-0450(1999)038<0741:TFODWT>2.0.CO;2. ISSN 1520-0450.
2. ^ Hossain, Miliha. "Whitening and Coloring Transforms for Multivariate Gaussian Random Variables". Project Rhea. Retrieved 21 March 2016.
3. ^ a b Kessy, A.; Lewin, A.; Strimmer, K. (2018). "Optimal whitening and decorrelation". The American Statistician. 72 (4): 309–314. arXiv:1512.00809. doi:10.1080/00031305.2016.1277159. S2CID 55075085.
4. ^ Friedman, J. (1987). "Exploratory Projection Pursuit". Journal of the American Statistical Association. 82 (397): 249–266. doi:10.1080/01621459.1987.10478427. ISSN 0162-1459. JSTOR 2289161. OSTI 1447861.
5. ^ "whitening R package". Retrieved 2018-11-25.