Peirce's criterion
In robust statistics, Peirce's criterion is a rule for eliminating outliers from data sets, which was devised by Benjamin Peirce.
Contents |
Outliers removed by Peirce's criterion [edit]
The problem of outliers [edit]
In data sets containing real-numbered measurements, the suspected outliers are the measured values that appear to lie outside the cluster of most of the other data values. The outliers would greatly change the estimate of location if the arithmetic average were to be used as a summary statistic of location. The problem is that the arithmetic mean is very sensitive to the inclusion of any outliers; in statistical terminology, the arithmetic mean is not robust.
In the presence of outliers, the statistician has two options. First, the statistician may remove the suspected outliers from the data set and then use the arithmetic mean to estimate the location parameter. Second, the statistician may use a robust statistic, such as the median statistic.
Peirce's criterion is a statistical procedure for eliminating outliers.
Uses of Peirce's criterion [edit]
The statistician and historian of statistics Stephen M. Stigler wrote the following about Benjamin Peirce:[1]
"In 1852 he published the first significance test designed to tell an investigator whether an outlier should be rejected (Peirce 1852, 1878). The test, based on a likelihood ratio type of argument, had the distinction of producing an international debate on the wisdom of such actions (Anscombe, 1960, Rider, 1933, Stigler, 1973a)."
Peirce's criterion is derived from a statistical analysis of the Gaussian distribution. Unlike some other criteria for removing outliers, Peirce's method can be applied to identify two or more outliers.
"It is proposed to determine in a series of
observations the limit of error, beyond which all observations involving so great an error may be rejected, provided there are as many as
such observations. The principle upon which it is proposed to solve this problem is, that the proposed observations should be rejected when the probability of the system of errors obtained by retaining them is less than that of the system of errors obtained by their rejection multiplied by the probability of making so many, and no more, abnormal observations."[2]
Hawkins[3] provides a formula for the criterion.
Peirce's criterion was used for decades at the United States Coast Survey.[4]
"From 1852 to 1867 he served as the director of the longitude determinations of the U. S. Coast Survey and from 1867 to 1874 as superintendent of the Survey. During these years his test was consistently employed by all the clerks of this, the most active and mathematically inclined statistical organization of the era."[1]
Peirce's criterion was discussed in William Chauvenet's book.[2]
Notes [edit]
- ^ a b Stigler (1979), page 246
- ^ a b Quoted in the editorial note on page 516 of the Collected Writings of Peirce (1982 edition). The quotation cites A Manual of Astronomy (2:558) by Chauvenet.
- ^ Hawkins (1980), page 10
- ^ Peirce (1878)
References [edit]
- Peirce, Benjamin, "Criterion for the Rejection of Doubtful Observations", Astronomical Journal II 45 (1852) and Errata to the original paper.
- Peirce, Benjamin (May 1877–1878). "On Peirce's criterion". Proceedings of the American Academy of Arts and Sciences 13: 348–351. doi:10.2307/25138498. JSTOR 25138498.
- Peirce, Charles Sanders (1870 [published 1873]). "Appendix No. 21. On the Theory of Errors of Observation". Report of the Superintendent of the United States Coast Survey Showing the Progress of the Survey During the Year 1870: 200–224.. NOAA PDF Eprint (goes to Report p. 200, PDF's p. 215). U.S. Coast and Geodetic Survey Annual Reports links for years 1837–1965.
- Peirce, Charles Sanders (1982 [1986 copyright]). "On the Theory of Errors of Observation". In Kloesel, Christian J. W., et alia. Writings of Charles S. Peirce: A Chronological Edition. Volume 3, 1872–1878. Bloomington, Indiana: Indiana University Press. pp. 140–160. ISBN 0-253-37201-1.
- Ross, Stephen, "Peirce's Criterion for the Elimination of Suspect Experimental Data", J. Engr. Technology, vol. 20 no.2, Fall, 2003. [1]
- Stigler, Stephen M. (March 1978). "Mathematical Statistics in the Early States". Annals of Statistics 6 (2): 239–265. doi:10.1214/aos/1176344123. JSTOR 2958876. MR 483118.
- Stigler, Stephen M. (1980). "Mathematical Statistics in the Early States". In Stephen M. Stigler. American Contributions to Mathematical Statistics in the Nineteenth Century, Volumes I & II I. New York: Arno Press.
- Stigler, Stephen M. (1989). "Mathematical Statistics in the Early States". In Peter Duren. A Century of Mathematics in America III. Providence, RI: American Mathematical Society. pp. 537–564.
- Hawkins, D.M. (1980). Identification of outliers. Chapman and Hall, London. ISBN 0-412-21900-X
- Chauvenet, W. (1876) A Manual of Spherical and Practical Astronomy. J.B.Lippincott, Philadelphia. (reprints of various editions: Dover, 1960; Peter Smith Pub, 2000, ISBN 0-8446-1845-4; Adamant Media Corporation (2 Volumes), 2001, ISBN 1-4021-7283-4, ISBN 1-4212-7259-8; BiblioBazaar, 2009, ISBN 1-103-92942-9 )
observations the limit of error, beyond which all observations involving so great an error may be rejected, provided there are as many as
such observations. The principle upon which it is proposed to solve this problem is, that the proposed observations should be rejected when the probability of the system of errors obtained by retaining them is less than that of the system of errors obtained by their rejection multiplied by the probability of making so many, and no more, abnormal observations."