Jump to content

Grubbs's test: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Reverted edits by 140.141.192.46 (talk) (HG) (3.4.12)
Tags: Huggle Rollback Reverted
Undid revision 1196559200 by Jacona (talk) It's Grubbs's test, not Grubbs' test. Stop miscorrecting it: https://doi.org/10.2307/3315459 and https://www.merriam-webster.com/grammar/what-happens-to-names-when-we-make-them-plural-or-possessive
Line 1: Line 1:
In statistics, '''Grubbs' test'''<!--possessive of the proper noun--> or the '''Grubbs test'''<!--attributive use of the proper name--> (named after [[Frank E. Grubbs]], who published the test in 1950<ref>{{cite journal |last=Grubbs |first=Frank E. |title=Sample criteria for testing outlying observations |journal=[[Annals of Mathematical Statistics]] |volume=21 |issue=1 |pages=27–58 |doi=10.1214/aoms/1177729885 |year=1950 |doi-access=free |hdl=2027.42/182780 |hdl-access=free }}</ref>), also known as the '''maximum normalized [[errors and residuals in statistics|residual]] test''' or '''extreme studentized deviate test''', is a [[Statistical hypothesis testing|test]] used to detect [[outlier]]s in a [[univariate]] data set assumed to come from a [[normal distribution|normally distributed]] population.
In statistics, '''Grubbs's test'''<!--possessive of the proper noun--> or the '''Grubbs test'''<!--attributive use of the proper name--> (named after [[Frank E. Grubbs]], who published the test in 1950<ref>{{cite journal |last=Grubbs |first=Frank E. |title=Sample criteria for testing outlying observations |journal=[[Annals of Mathematical Statistics]] |volume=21 |issue=1 |pages=27–58 |doi=10.1214/aoms/1177729885 |year=1950 |doi-access=free |hdl=2027.42/182780 |hdl-access=free }}</ref>), also known as the '''maximum normalized [[errors and residuals in statistics|residual]] test''' or '''extreme studentized deviate test''', is a [[Statistical hypothesis testing|test]] used to detect [[outlier]]s in a [[univariate]] data set assumed to come from a [[normal distribution|normally distributed]] population.


==Definition==
==Definition==
Grubbs' test is based on the assumption of [[normal distribution|normality]]. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs test.<ref>Quoted from the ''Engineering and Statistics Handbook'', paragraph 1.3.5.17, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm</ref>
Grubbs's test is based on the assumption of [[normal distribution|normality]]. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs test.<ref>Quoted from the ''Engineering and Statistics Handbook'', paragraph 1.3.5.17, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm</ref>


Grubbs' test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or fewer since it frequently tags most of the points as outliers.<ref>{{Cite journal|last1=Adikaram|first1=K. K. L. B.|last2=Hussein|first2=M. A.|last3=Effenberger|first3=M.|last4=Becker|first4=T.|date=2015-01-14|title=Data Transformation Technique to Improve the Outlier Detection Power of Grubbs' Test for Data Expected to Follow Linear Relation|journal=Journal of Applied Mathematics|volume=2015|pages=1–9|language=en|doi=10.1155/2015/708948|doi-access=free}}</ref>
Grubbs's test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or fewer since it frequently tags most of the points as outliers.<ref>{{Cite journal|last1=Adikaram|first1=K. K. L. B.|last2=Hussein|first2=M. A.|last3=Effenberger|first3=M.|last4=Becker|first4=T.|date=2015-01-14|title=Data Transformation Technique to Improve the Outlier Detection Power of Grubbs's Test for Data Expected to Follow Linear Relation|journal=Journal of Applied Mathematics|volume=2015|pages=1–9|language=en|doi=10.1155/2015/708948|doi-access=free}}</ref>


Grubbs' test is defined for the following [[statistical hypothesis|hypotheses]]:
Grubbs's test is defined for the following [[statistical hypothesis|hypotheses]]:
:H<sub>0</sub>: There are no outliers in the data set
:H<sub>0</sub>: There are no outliers in the data set
:H<sub>a</sub>: There is exactly one outlier in the data set
:H<sub>a</sub>: There is exactly one outlier in the data set
Line 25: Line 25:


===One-sided case===
===One-sided case===
Grubbs' test can also be defined as a one-sided test, replacing α/(2''N'') with α/''N''. To test whether the minimum value is an outlier, the test statistic is
Grubbs's test can also be defined as a one-sided test, replacing α/(2''N'') with α/''N''. To test whether the minimum value is an outlier, the test statistic is
:<math>
:<math>
G = \frac{\bar{Y}-Y_\min}{s}
G = \frac{\bar{Y}-Y_\min}{s}

Revision as of 15:35, 18 January 2024

In statistics, Grubbs's test or the Grubbs test (named after Frank E. Grubbs, who published the test in 1950[1]), also known as the maximum normalized residual test or extreme studentized deviate test, is a test used to detect outliers in a univariate data set assumed to come from a normally distributed population.

Definition

Grubbs's test is based on the assumption of normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs test.[2]

Grubbs's test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or fewer since it frequently tags most of the points as outliers.[3]

Grubbs's test is defined for the following hypotheses:

H0: There are no outliers in the data set
Ha: There is exactly one outlier in the data set

The Grubbs test statistic is defined as

with and denoting the sample mean and standard deviation, respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.

This is the two-sided test, for which the hypothesis of no outliers is rejected at significance level α if

with tα/(2N),N−2 denoting the upper critical value of the t-distribution with N − 2 degrees of freedom and a significance level of α/(2N).

One-sided case

Grubbs's test can also be defined as a one-sided test, replacing α/(2N) with α/N. To test whether the minimum value is an outlier, the test statistic is

with Ymin denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is

with Ymax denoting the maximum value.

Several graphical techniques can be used to detect outliers. A simple run sequence plot, a box plot, or a histogram should show any obviously outlying points. A normal probability plot may also be useful.

See also

References

  1. ^ Grubbs, Frank E. (1950). "Sample criteria for testing outlying observations". Annals of Mathematical Statistics. 21 (1): 27–58. doi:10.1214/aoms/1177729885. hdl:2027.42/182780.
  2. ^ Quoted from the Engineering and Statistics Handbook, paragraph 1.3.5.17, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm
  3. ^ Adikaram, K. K. L. B.; Hussein, M. A.; Effenberger, M.; Becker, T. (2015-01-14). "Data Transformation Technique to Improve the Outlier Detection Power of Grubbs's Test for Data Expected to Follow Linear Relation". Journal of Applied Mathematics. 2015: 1–9. doi:10.1155/2015/708948.

Further reading

  • Grubbs, Frank (February 1969). "Procedures for Detecting Outlying Observations in Samples". Technometrics. 11 (1). Technometrics, Vol. 11, No. 1: 2–21. doi:10.2307/1266761. JSTOR 1266761.
  • Stefansky, W. (1972). "Rejecting Outliers in Factorial Designs". Technometrics. 14 (2). Technometrics, Vol. 14, No. 2: 469–479. doi:10.2307/1267436. JSTOR 1267436.

Public Domain This article incorporates public domain material from the National Institute of Standards and Technology