False positive paradox
The false positive paradox is a statistical result where false positive tests are more probable than true positive tests, occurring when the overall population has a low incidence of a condition and the incidence rate is lower than the false positive rate. The probability of a positive test result is determined not only by the accuracy of the test but by the characteristics of the sampled population (see Bayes' theorem).[1] When the incidence, the proportion of those who have a given condition, is lower than the test's false positive rate, even tests that have a very low chance of giving a false positive in an individual case will give more false than true positives overall.[2] So, in a society with very few infected people—fewer proportionately than the test gives false positives—there will actually be more who test positive for a disease incorrectly and don't have it than those who test positive accurately and do. The paradox has surprised many.[3]
It is especially counter-intuitive when interpreting a positive result in a test on a low-incidence population after having dealt with positive results drawn from a high-incidence population.[2] If the false positive rate of the test is higher than the proportion of the new population with the condition, then a test administrator whose experience has been drawn from testing in a high-incidence population may conclude from experience that a positive test result usually indicates a positive subject, when in fact a false positive is far more likely to have occurred.
Not adjusting to the scarcity of the condition in the new population, and concluding that a positive test result probably indicates a positive subject, even though population incidence is below the false positive rate is a "base rate fallacy".
Contents |
[edit] Example
[edit] High-incidence population
Imagine running an HIV test on population A, in which 200 out of 10,000 (2%) are infected. The test has a false positive rate of .0004 (.04%) and no false negative rate. The expected outcome of a million tests on population A would be:
Unhealthy and test indicates disease (true positive)
1,000,000 × (200/10000) = 20,000 people would receive a true positive
Healthy and test indicates disease (false positive)
1,000,000 × (9800/10000) × .0004 = 392 people would receive a false positive (The remaining 979,608 tests are correctly negative.)
So, in population A, a person receiving a positive test could be over 98% confident (20,000/20,392) that it correctly indicates infection.
[edit] Low-incidence population
Now consider the same test applied to population B, in which only 1 person in 10,000 (.01%) is infected . The expected outcome of a million tests on population B would be:
Unhealthy and test indicates disease (true positive)
1,000,000 × (1/10,000) = 100 people would receive a true positive
Healthy and test indicates disease (false positive)
1,000,000 × (9999/10,000) × .0004 ≈ 400 people would receive a false positive (The remaining 999,500 tests are correctly negative.)
In population B, only 100 of the 500 total people with a positive test result are actually infected. So, the probability of actually being infected after you are told you are infected is only 20% (100/500) for a test that otherwise appears to be "over 99.95% accurate".
A tester with experience of group A might find it a paradox that in group B, a result that had almost always indicated infection is now usually a false positive. The confusion of the posterior probability of infection with the prior probability of receiving a false negative is a natural error after receiving a life-threatening test result.
[edit] See also
- Prosecutor's fallacy, a mistake in reasoning that involves ignoring a low prior probability.
- Simpson's paradox, another common error in statistical reasoning dealing with comparing groups
[edit] References
- ^ Rheinfurth, M. H.; Howell, L. W. (March 1998). Probability and statistics in aerospace engineering (pdf). NASA. p. 16. http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19980045313_1998119122.pdf. "MESSAGE: False positive tests are more probable than true positive tests when the overall population has a low incidence of the disease. This is called the false-positive paradox."
- ^ a b Vacher, H. L. (May 2003). "Quantitative literacy - drug testing, cancer screening, and the identification of igneous rocks". Journal of Geoscience Education: 2. http://findarticles.com/p/articles/mi_qa4089/is_200305/ai_n9252796/pg_2/. "At first glance, this seems perverse: the less the students as a whole use steroids, the more likely a student identified as a user will be a non-user. This has been called the False Positive Paradox" - Citing: Smith, W. (1993). The cartoon guide to statistics. New York: Harper Collins. p. 49.
- ^ Madison, B. L. (August 2007). "Mathematical Proficiency for Citizenship". In Schoenfeld, A. H.. Assessing Mathematical Proficiency. Mathematical Sciences Research Institute Publications (New ed.). Cambridge University Press. p. 122. ISBN 9780521697668. http://books.google.com/books?id=5gQz0akjYcwC&pg=113#v=onepage&q&f=false. "The correct [probability estimate...] is surprising to many; hence, the term paradox."