Exact statistics

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Exact statistics, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods. The main characteristic of exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size. Exact statistical methods help avoid some of the unreasonable assumptions of traditional statistical methods, such as the assumption of equal variances in classical ANOVA. They also allow exact inference on variance components of mixed models.

When exact p-values and confidence intervals are computed under a certain distribution, such as the normal distribution, then the underlying methods are referred to as exact parametric methods. The exact methods that do not make any distributional assumptions are referred to as exact nonparametric methods. The latter has the advantage of making fewer assumptions whereas, the former tend to yield more powerful tests when the distributional assumption is reasonable. For advanced methods such as higher-way ANOVA regression analysis, and mixed models, only exact parametric methods are available.

When the sample size is small, asymptotic results given by some traditional methods may not be valid. In such situations, the asymptotic p-values may differ substantially from the exact p-values. Hence asymptotic and other approximate results may lead to unreliable and misleading conclusions.

The approach[edit]

All classical statistical procedures are constructed using statistics which depend only on observable random vectors, whereas generalized estimators, tests, and confidence intervals used in exact statistics take advantage of the observable random vectors and the observed values both, as in the Bayesian approach but without having to treat constant parameters as random variables. For example, in sampling from a normal population with mean \mu and variance \sigma ^2, suppose \overline{X} and S ^2 are the sample mean and the sample variance. Then, it is well known that

 Z = \sqrt{n}(\overline{X} - \mu)/ \sigma \sim N(0,1)

and that

U = n S^2 / \sigma^2 \sim \chi^2 _ {n-1}.

Now suppose the parameter of interest is the coefficient of variation, \rho = \mu /\sigma . Then, we can easily perform exact tests and exact confidence intervals for \rho based on the generalized statistic

R = \frac {\overline{x} S}  {s \sigma} - \frac{\overline{X}- \mu} {\sigma} = \frac {\overline{x}} {s} \frac {\sqrt{U}} {\sqrt{n}} ~-~ \frac {Z} {\sqrt{n}} ,

where \overline{x} is the observed value of \overline{X} and S is the observed value of s. Exact inferences on \rho based on probabilities and expected values of R are possible because its distribution and the observed value are both free of nuisance parameters.

Generalized p-values[edit]

Classical statistical methods do not provide exact tests to many statistical problems such as testing Variance Components and ANOVA under unequal variances. To rectify this situation, the generalized p-values are defined as an extension of the classical p-values so that one can perform tests based on exact probability statements valid for any sample size.

See also[edit]

References[edit]

  • Mehta, C. R. 1995. SPSS 6.1 Exact test for Windows. Englewood Cliffs, NJ: Prentice Hall.
  • Mehta CR and Patel NR. A network algorithm for performing Fisher’s exact test in rxc contingency tables. JASA. 1983; 78(382)427-434.
  • Mehta CR and Patel NR. Exact logistic regression: theory and examples. Statistics in Medicine, 1995; 14:2143-2160.
  • Mehta CR, Patel NR and Gray R. On computing an exact confidence interval for the common odds ratio in several 2 x 2 contingency tables. JASA, 1985; 80(392):969-973.

External links[edit]

  • XPro, Free software package for exact parametric statistics
  • StatXact, Commercial software package of non-parametric exact and Monte Carlo methods for computing p-values for large or unbalanced data sets from Mehta & Patel (Op. Cit.)
  • LogXact, Commercial software package for regression modeling using exact inference (Op. Cit.)