t-statistic

From Wikipedia, the free encyclopedia
  (Redirected from Student's t-statistic)
Jump to: navigation, search

In statistics, the t-statistic is a ratio of the departure of an estimated parameter from its notional value and its standard error. It is used in hypothesis testing, for example in the Student’s t-test, in the augmented Dickey–Fuller test, and in bootstrapping.

Definition[edit]

Let \scriptstyle\hat\beta be an estimator of parameter β in some statistical model. Then a t-statistic for this parameter is any quantity of the form


    t_{\hat{\beta}} = \frac{\hat\beta - \beta_0}{\mathrm{s.e.}(\hat\beta)},

where β0 is a non-random, known constant, and \scriptstyle s.e.(\hat\beta) is the standard error of the estimator \scriptstyle\hat\beta. By default, statistical packages report t-statistic with β0 = 0 (these t-statistics are used to test the significance of corresponding regressor). However, when t-statistic is needed to test the hypothesis of the form H0: β = β0, then a non-zero β0 may be used.

If \scriptstyle\hat\beta is an ordinary least squares estimator in the classical linear regression model (that is, with normally distributed and homoskedastic error terms), and if the true value of parameter β is equal to β0, then the sampling distribution of the t-statistic is the Student’s t-distribution with (n − k) degrees of freedom, where n is the number of observations, and k is the number of regressors (including the intercept).

In the majority of models the estimator \scriptstyle\hat\beta is consistent for β and distributed asymptotically normally. If the true value of parameter β is equal to β0 and the quantity \scriptstyle s.e.(\hat\beta) correctly estimates the asymptotic variance of this estimator, then the t-statistic will have asymptotically the standard normal distribution.

In some models the distribution of t-statistic is different from normal, even asymptotically. For example, when a time series with unit root is regressed in the augmented Dickey–Fuller test, the test t-statistic will asymptotically have one of the Dickey–Fuller distributions (depending on the test setting).

Use[edit]

Most frequently, t-statistics are used in Student's t-tests, a form of statistical hypothesis testing, and in the computation of certain confidence intervals.

The key property of the t-statistic is that it is a pivotal quantity – while defined in terms of the sample mean, its sampling distribution does not depend on the sample parameters, and thus it can be used regardless of what these may be.

One can also divide a residual by the sample standard deviation:

 g(x,X) = \frac{x - \overline{X}}{s}

to compute an estimate for the number of standard deviations a given sample is from the mean, as a sample version of a z-score, the z-score requiring the population parameters.

Prediction[edit]

For more details on this topic, see Prediction interval.

Given a normal distribution N(\mu,\sigma^2) with unknown mean and variance, the t-statistic of a future observation X_{n+1}, after one has made n observations, is an ancillary statistic – a pivotal quantity (does not depend on the values of μ and σ2) that is a statistic (computed from observations). This allows one to compute a frequentist prediction interval (a predictive confidence interval), via the following t-distribution:

\frac{X_{n+1}-\overline{X}_n}{s_n\sqrt{1+n^{-1}}} \sim T^{n-1}

Solving for X_{n+1} yields the prediction distribution

\overline{X}_n + s_n\sqrt{1+n^{-1}} \cdot T^{n-1}

from which one may compute predictive confidence intervals – given a probability p, one may compute intervals such that 100p% of the time, the next observation X_{n+1} will fall in that interval.

History[edit]

For more details on this topic, see Student's t-test.

The term "t-statistic" is abbreviated from "test statistic",[citation needed] while "Student" was the pen name of William Sealy Gosset, who introduced the t-statistic and t-test in 1908, while working for the Guinness brewery in Dublin, Ireland.

Related concepts[edit]

z-score
If the population parameters are known, then rather than computing the t-statistic, one can compute the z-score; analogously, rather than using a t-test, one uses a z-test. This is rare outside of standardized testing.
Studentized residual
In regression analysis, the standard errors of the estimators at different data points vary (compare the middle versus endpoints of a simple linear regression), and thus one must divide the different residuals by different estimates for the error, yielding what are called studentized residuals.

See also[edit]

References[edit]

External links[edit]