Grubbs' test for outliers
Grubbs's test (named after Frank E. Grubbs, who published the test in 1950), also known as the maximum normalized residual test or extreme studentized deviate test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population.
Grubbs's test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or fewer since it frequently tags most of the points as outliers.
Grubbs's test is defined for the hypothesis:
- H0: There are no outliers in the data set
- Ha: There is exactly one outlier in the data set
The Grubbs test statistic is defined as:
This is the two-sided version of the test. The Grubbs test can also be defined as a one-sided test. To test whether the minimum value is an outlier, the test statistic is
with Ymin denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is
with Ymax denoting the maximum value.
Several graphical techniques can, and should, be used to detect outliers. A simple run sequence plot, a box plot, or a histogram should show any obviously outlying points. A normal probability plot may also be useful.
- Grubbs, Frank E. (1950). "Sample criteria for testing outlying observations". Annals of Mathematical Statistics. 21 (1): 27–58. doi:10.1214/aoms/1177729885.
- Quoted from the Engineering and Statistics Handbook, paragraph 126.96.36.199, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm
- Grubbs, Frank (February 1969). "Procedures for Detecting Outlying Observations in Samples". Technometrics. Technometrics, Vol. 11, No. 1. 11 (1): 1–21. doi:10.2307/1266761. JSTOR 1266761.
- Stefansky, W. (1972). "Rejecting Outliers in Factorial Designs". Technometrics. Technometrics, Vol. 14, No. 2. 14 (2): 469–479. doi:10.2307/1267436. JSTOR 1267436.