Hotelling's T-squared distribution
In statistics Hotelling's T-squared distribution is important because it arises as the distribution of a set of statistics which are natural generalisations of the statistics underlying Student's t-distribution. In particular, the distribution arises in multivariate statistics in undertaking tests of the differences between the (multivariate) means of different populations, where tests for univariate problems would make use of a t-test. It is proportional to the F-distribution.
The distribution is named for Harold Hotelling, who developed it[1] as a generalization of Student's t-distribution.
Contents |
The distribution[edit]
If the notation
is used to denote a random variable having Hotelling's T-squared distribution with parameters p and m then, if a random variable X has Hotelling's T-squared distribution,
then[1]
where
is the F-distribution with parameters p and m−p+1.
Hotelling's T-squared statistic[edit]
Hotelling's T-squared statistic is a generalization of Student's t statistic that is used in multivariate hypothesis testing, and is defined as follows.[1]
Let
denote a p-variate normal distribution with location
and covariance
. Let
be n independent random variables, which may be represented as
column vectors of real numbers. Define
to be the sample mean. It can be shown that
where
is the chi-squared distribution with p degrees of freedom. To show this use the fact that
and then derive the characteristic function of the random variable
. This is done below,
However,
is often unknown and we wish to do hypothesis testing on the location
.
Define
to be the sample covariance. Here we denote transpose by an apostrophe. It can be shown that
is positive-definite and
follows a p-variate Wishart distribution with n−1 degrees of freedom.[2] Hotelling's T-squared statistic is then defined to be
because it can be shown that[citation needed]
i.e.
where
is the F-distribution with parameters p and n−p. In order to calculate a p value, multiply the t2 statistic by the above constant and use the F-distribution.
Hotelling's two-sample T-squared statistic[edit]
If
and
, with the samples independently drawn from two independent multivariate normal distributions with the same mean and covariance, and we define
as the sample means, and
as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-squared statistic is
and it can be related to the F-distribution by[2]
The non-null distribution of this statistic is the noncentral F-distribution (the ratio of a non-central Chi-squared random variable and an independent central Chi-squared random variable)
with
where
is the difference vector between the population means.
See also[edit]
- Student's t-test in univariate statistics
- Student's t-distribution in univariate probability theory
- Multivariate Student distribution.
- F-distribution (commonly tabulated or available in software libraries, and hence used for testing the T-squared statistic using the relationship given above)
- Wilks' lambda distribution (in multivariate statistics Wilks's Λ is to Hotelling's T2 as Snedecor's F is to Student's t in univariate statistics).
References[edit]
- ^ a b c Hotelling, H. (1931). "The generalization of Student's ratio". Annals of Mathematical Statistics 2 (3): 360–378. doi:10.1214/aoms/1177732979.
- ^ a b K.V. Mardia, J.T. Kent, and J.M. Bibby (1979) Multivariate Analysis, Academic Press.
External links[edit]
- Prokhorov, A.V. (2001), "Hotelling T2-distribution", in Hazewinkel, Michiel, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||




















