Consistency (statistics)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, consistency of procedures, such as computing confidence intervals or conducting hypothesis tests, is a desired property of their behaviour as the number of items in the data set to which they are applied increases indefinitely. In particular, consistency requires that the outcome of the procedure with unlimited data should identify the underlying truth.[1] Use of the term in statistics derives from Sir Ronald Fisher in 1922.[2]

Use of the terms consistency and consistent in statistics is restricted to cases where essentially the same procedure can be applied to any number of data items. In complicated applications of statistics, there may be several ways in which the number of data items may grow. For example, records for rainfall within an area might increase in three ways: records for additional time periods; records for additional sites with a fixed area; records for extra sites obtained by extending the size of the area. In such cases, the property of consistency may be limited to one or more of the possible ways a sample size can grow.

Estimators[edit]

Main article: Consistent estimator

A consistent estimator is one for which, when the estimate is considered as a random variable indexed by the number n of items in the data set, as n increases the estimates converge to the value that the estimator is designed to estimate.

Main article: Fisher consistency

An estimator that has Fisher consistency is one for which, if the estimator were applied to the entire population rather than a sample, the true value of the estimated parameter would be obtained.

Tests[edit]

A consistent test is one for which the power of the test for a fixed untrue hypothesis increases to one as the number of data items increases.[1]

Classification[edit]

In statistical classification, a consistent classifier is one for which the probability of correct classification, given a training set, approaches, as the size of the training set increases, the best probability theoretically possible if the population distributions were fully known.

Sparsistency[edit]

Let  \mathbf{b} be a vector and define the support  supp(\mathbf{b}) = \{i : \mathbf{b}_i \neq 0\} where \mathbf{b}_i is the ith element of \mathbf{b} . Let  \hat{\mathbf{b}} be an estimator for  \mathbf{b} . Then sparsistency is the property that the support of the estimator converges to the true support as the number of samples grows to infinity. More formally,  P(supp(\hat{\mathbf{b}}) = supp(\mathbf{b})) \rightarrow 1 as  n\rightarrow \infty .[3]

See also[edit]

References[edit]

  1. ^ a b Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9 (entries for consistency, consistent estimator, consistent test)
  2. ^ Upton, G.; Cook, I. (2006) Oxford Dictionary of Statistics, 2nd Edition, OUP. ISBN 978-0-19-954145-4
  3. ^ http://normaldeviate.wordpress.com/2013/09/11/consistency-sparsistency-and-presistency/