Hotelling's T-squared distribution
In statistics Hotelling's T-squared distribution is a univariate distribution proportional to the F-distribution and arises importantly as the distribution of a set of statistics which are natural generalizations of the statistics underlying Student's t-distribution. In particular, the distribution arises in multivariate statistics in undertaking tests of the differences between the (multivariate) means of different populations, where tests for univariate problems would make use of a t-test.
If the vector pd1 is Gaussian multivariate-distributed with zero mean and unit covariance matrix N(p01,pIp) and mMp is a p x p matrix with a Wishart distribution with unit scale matrix and m degrees of freedom W(pIp,m) then m(1d' pM−1pd1) has a Hotelling T2 distribution with dimensionality parameter p and m degrees of freedom.
If the notation is used to denote a random variable having Hotelling's T-squared distribution with parameters p and m then, if a random variable X has Hotelling's T-squared distribution,
where is the F-distribution with parameters p and m−p+1.
Hotelling's T-squared statistic
be n independent random variables, which may be represented as column vectors of real numbers. Define
to be the sample mean. It can be shown that
where is the chi-squared distribution with p degrees of freedom. To show this use the fact that and then derive the characteristic function of the random variable . This is done below,
However, is often unknown and we wish to do hypothesis testing on the location .
Sum of p squared t's
to be the sample covariance. Here we denote transpose by an apostrophe. It can be shown that is positive-definite and follows a p-variate Wishart distribution with n−1 degrees of freedom. Hotelling's T-squared statistic is then defined to be
and, also from above,
where is the F-distribution with parameters p and n−p. In order to calculate a p value, multiply the t2 statistic by the above constant and use the F-distribution.
Hotelling's two-sample T-squared statistic
as the sample means, and
as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-squared statistic is
and it can be related to the F-distribution by
where is the difference vector between the population means.
More robust and powerful tests than Hotelling's two-sample test have been proposed in the literature, see for example the interpont distance based tests which can be applied also when the number of variables is comparable with, or even larger than, the number of subjects.
- Student's t-test in univariate statistics
- Student's t-distribution in univariate probability theory
- Multivariate Student distribution.
- F-distribution (commonly tabulated or available in software libraries, and hence used for testing the T-squared statistic using the relationship given above)
- Wilks' lambda distribution (in multivariate statistics Wilks's Λ is to Hotelling's T2 as Snedecor's F is to Student's t in univariate statistics).
- Hotelling, H. (1931). "The generalization of Student's ratio". Annals of Mathematical Statistics 2 (3): 360–378. doi:10.1214/aoms/1177732979.
- Eric W. Weisstein, CRC Concise Encyclopedia of Mathematics, Second Edition, Chapman & Hall/CRC, 2003, p. 1408
- Mardia, K. V.; Kent, J. T.; Bibby, J. M. (1979). Multivariate Analysis. Academic Press. ISBN 0-12-471250-9.
- Marozzi, M. (2014). "Multivariate tests based on interpoint distances with application to magnetic resonance imaging". Statistical Methods in Medical Research. doi:10.1177/0962280214529104.
- Marozzi, M. (2015). "Multivariate multidistance tests for high-dimensional low sample size case-control studies". Statistics in Medicine 34. doi:10.1002/sim.6418.
- Prokhorov, A.V. (2001), "Hotelling T2-distribution", in Hazewinkel, Michiel, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4