# Hotelling's T-squared distribution

(Redirected from Hotelling's T-square)

In statistics Hotelling's T-squared distribution is a univariate distribution proportional to the F-distribution and arises importantly as the distribution of a set of statistics which are natural generalizations of the statistics underlying Student's t-distribution. In particular, the distribution arises in multivariate statistics in undertaking tests of the differences between the (multivariate) means of different populations, where tests for univariate problems would make use of a t-test.

The distribution is named for Harold Hotelling, who developed it[1] as a generalization of Student's t-distribution.

## The distribution

If the vector pd1 is Gaussian multivariate-distributed with zero mean and unit covariance matrix N(p01,pIp) and pMp is a p x p matrix with a Wishart distribution with unit scale matrix and m degrees of freedom W(pIp,m) then m(1d' pM−1pd1) has a Hotelling T2 distribution with dimensionality parameter p and m degrees of freedom.[2]

If the notation ${\displaystyle T_{p,m}^{2}}$ is used to denote a random variable having Hotelling's T-squared distribution with parameters p and m then, if a random variable X has Hotelling's T-squared distribution,

${\displaystyle X\sim T_{p,m}^{2}}$

then[1]

${\displaystyle {\frac {m-p+1}{pm}}X\sim F_{p,m-p+1}}$

where ${\displaystyle F_{p,m-p+1}}$ is the F-distribution with parameters p and m−p+1.

## Hotelling's T-squared statistic

Hotelling's T-squared statistic is a generalization of Student's t statistic that is used in multivariate hypothesis testing, and is defined as follows.[1]

Let ${\displaystyle {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\mathbf {\Sigma } })}$ denote a p-variate normal distribution with location ${\displaystyle {\boldsymbol {\mu }}}$ and covariance ${\displaystyle {\mathbf {\Sigma } }}$. Let

${\displaystyle {\mathbf {x} }_{1},\dots ,{\mathbf {x} }_{n}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\mathbf {\Sigma } })}$

be n independent random variables, which may be represented as ${\displaystyle p\times 1}$ column vectors of real numbers. Define

${\displaystyle {\overline {\mathbf {x} }}={\frac {\mathbf {x} _{1}+\cdots +\mathbf {x} _{n}}{n}}}$

to be the sample mean. It can be shown that

${\displaystyle n({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})'{\mathbf {\Sigma } }^{-1}({\overline {\mathbf {x} }}-{\boldsymbol {\mathbf {\mu } }})\sim \chi _{p}^{2},}$

where ${\displaystyle \chi _{p}^{2}}$ is the chi-squared distribution with p degrees of freedom. To show this use the fact that ${\displaystyle {\overline {\mathbf {x} }}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\mathbf {\Sigma } }/n)}$ and then derive the characteristic function of the random variable ${\displaystyle \mathbf {y} =n({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})'{\mathbf {\Sigma } }^{-1}({\overline {\mathbf {x} }}-{\boldsymbol {\mathbf {\mu } }})}$. This is done below,

${\displaystyle \phi _{\mathbf {y} }(\theta )=\operatorname {E} e^{i\theta \mathbf {y} },}$
${\displaystyle =\operatorname {E} e^{i\theta n({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})'{\mathbf {\Sigma } }^{-1}({\overline {\mathbf {x} }}-{\boldsymbol {\mathbf {\mu } }})}}$
${\displaystyle =\int e^{i\theta n({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})'{\mathbf {\Sigma } }^{-1}({\overline {\mathbf {x} }}-{\boldsymbol {\mathbf {\mu } }})}(2\pi )^{-{\frac {p}{2}}}|{\boldsymbol {\Sigma }}/n|^{-{\frac {1}{2}}}\,e^{-{\frac {1}{2}}n({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})'{\boldsymbol {\Sigma }}^{-1}({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})}\,dx_{1}...dx_{p}}$
${\displaystyle =\int (2\pi )^{-{\frac {p}{2}}}|{\boldsymbol {\Sigma }}/n|^{-{\frac {1}{2}}}\,e^{-{\frac {1}{2}}n({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})'({\boldsymbol {\Sigma }}^{-1}-2i\theta {\boldsymbol {\Sigma }}^{-1})({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})}\,dx_{1}...dx_{p},}$
${\displaystyle =|({\boldsymbol {\Sigma }}^{-1}-2i\theta {\boldsymbol {\Sigma }}^{-1})^{-1}/n|^{\frac {1}{2}}|{\boldsymbol {\Sigma }}/n|^{-{\frac {1}{2}}}\int (2\pi )^{-{\frac {p}{2}}}|({\boldsymbol {\Sigma }}^{-1}-2i\theta {\boldsymbol {\Sigma }}^{-1})^{-1}/n|^{-{\frac {1}{2}}}\,e^{-{\frac {1}{2}}n({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})'({\boldsymbol {\Sigma }}^{-1}-2i\theta {\boldsymbol {\Sigma }}^{-1})({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})}\,dx_{1}...dx_{p},}$[citation needed]
${\displaystyle =|(\mathbf {I} _{p}-2i\theta \mathbf {I} _{p})|^{-{\frac {1}{2}}},}$
${\displaystyle =(1-2i\theta )^{-{\frac {p}{2}}}.~~\blacksquare }$

However, ${\displaystyle {\mathbf {\Sigma } }}$ is often unknown and we wish to do hypothesis testing on the location ${\displaystyle {\boldsymbol {\mu }}}$.

### Sum of multiple squared t's

Define

${\displaystyle {\mathbf {W} }={\frac {1}{n-1}}\sum _{i=1}^{n}(\mathbf {x} _{i}-{\overline {\mathbf {x} }})(\mathbf {x} _{i}-{\overline {\mathbf {x} }})'}$

to be the sample covariance. Here we denote transpose by an apostrophe. It can be shown that ${\displaystyle \mathbf {W} }$ is a positive (semi) definite matrix and ${\displaystyle (n-1)\mathbf {W} }$ follows a p-variate Wishart distribution with n−1 degrees of freedom.[3] Hotelling's T-squared statistic is then defined[4] to be

${\displaystyle t^{2}=n({\overline {\mathbf {x} }}-{\boldsymbol {\mu }})'{\mathbf {W} }^{-1}({\overline {\mathbf {x} }}-{\boldsymbol {\mathbf {\mu } }})}$

and, also from above,

${\displaystyle t^{2}\sim T_{p,n-1}^{2}}$

i.e.

${\displaystyle {\frac {n-p}{p(n-1)}}t^{2}\sim F_{p,n-p},}$

where ${\displaystyle F_{p,n-p}}$ is the F-distribution with parameters p and n−p. In order to calculate a p-value, multiply the t2 statistic by the above constant and use the F-distribution.

## Hotelling's two-sample T-squared statistic

If ${\displaystyle {\mathbf {x} }_{1},\dots ,{\mathbf {x} }_{n_{x}}\sim N_{p}({\boldsymbol {\mu }},{\mathbf {V} })}$ and ${\displaystyle {\mathbf {y} }_{1},\dots ,{\mathbf {y} }_{n_{y}}\sim N_{p}({\boldsymbol {\mu }},{\mathbf {V} })}$, with the samples independently drawn from two independent multivariate normal distributions with the same mean and covariance, and we define

${\displaystyle {\overline {\mathbf {x} }}={\frac {1}{n_{x}}}\sum _{i=1}^{n_{x}}\mathbf {x} _{i}\qquad {\overline {\mathbf {y} }}={\frac {1}{n_{y}}}\sum _{i=1}^{n_{y}}\mathbf {y} _{i}}$

as the sample means, and

${\displaystyle {\mathbf {W} }={\frac {\sum _{i=1}^{n_{x}}(\mathbf {x} _{i}-{\overline {\mathbf {x} }})(\mathbf {x} _{i}-{\overline {\mathbf {x} }})'+\sum _{i=1}^{n_{y}}(\mathbf {y} _{i}-{\overline {\mathbf {y} }})(\mathbf {y} _{i}-{\overline {\mathbf {y} }})'}{n_{x}+n_{y}-2}}}$

as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-squared statistic is

${\displaystyle t^{2}={\frac {n_{x}n_{y}}{n_{x}+n_{y}}}({\overline {\mathbf {x} }}-{\overline {\mathbf {y} }})'{\mathbf {W} }^{-1}({\overline {\mathbf {x} }}-{\overline {\mathbf {y} }})\sim T^{2}(p,n_{x}+n_{y}-2)}$

and it can be related to the F-distribution by[3]

${\displaystyle {\frac {n_{x}+n_{y}-p-1}{(n_{x}+n_{y}-2)p}}t^{2}\sim F(p,n_{x}+n_{y}-1-p).}$

The non-null distribution of this statistic is the noncentral F-distribution (the ratio of a non-central Chi-squared random variable and an independent central Chi-squared random variable)

${\displaystyle {\frac {n_{x}+n_{y}-p-1}{(n_{x}+n_{y}-2)p}}t^{2}\sim F(p,n_{x}+n_{y}-1-p;\delta ),}$

with

${\displaystyle \delta ={\frac {n_{x}n_{y}}{n_{x}+n_{y}}}{\boldsymbol {\nu }}'\mathbf {V} ^{-1}{\boldsymbol {\nu }},}$

where ${\displaystyle {\boldsymbol {\nu }}}$ is the difference vector between the population means.

More robust and powerful tests than Hotelling's two-sample test have been proposed in the literature, see for example the interpoint distance based tests which can be applied also when the number of variables is comparable with, or even larger than, the number of subjects.[5][6]

In the two variable case, the formula simplifies nicely allowing appreciation of how the correlation, ${\displaystyle \rho }$, between the variables affects ${\displaystyle t^{2}}$. If we define

${\displaystyle d_{1}={\overline {x}}_{.1}-{\overline {y}}_{.1},\qquad d_{2}={\overline {x}}_{.2}-{\overline {y}}_{.2}}$

and

${\displaystyle s_{1}={\sqrt {W_{11}}}\qquad s_{2}={\sqrt {W_{22}}}}$

then

${\displaystyle t^{2}={\frac {n_{x}n_{y}}{(n_{x}+n_{y})(1-r^{2})}}\left[\left({\frac {d_{1}}{s_{1}}}\right)^{2}+\left({\frac {d_{2}}{s_{2}}}\right)^{2}-2\rho \left({\frac {d_{1}}{s_{1}}}\right)\left({\frac {d_{2}}{s_{2}}}\right)\right]}$

Thus, if the differences in the two rows of the vector ${\displaystyle ({\overline {\mathbf {x} }}-{\overline {\mathbf {y} }})}$ are of the same sign, in general, ${\displaystyle t^{2}}$ becomes smaller as ${\displaystyle \rho }$ becomes more positive. If the differences are of opposite sign ${\displaystyle t^{2}}$ becomes larger as ${\displaystyle \rho }$ becomes more positive.