In statistics, the Bonferroni correction is a method to counteract the problem of multiple comparisons. Bonferroni correction is the simplest method for counteracting the multiple comparisons problem; however, it is a conservative method that gives greater chance of failure to reject a false null hypothesis than other methods, as it ignores potentially valuable information, such as the distribution of p-values across all comparisons (which, if the null hypothesis is correct for all comparisons, is expected to take uniform distribution).
Of note, the Bonferroni correction controls for family-wise error rate (i.e. relating to the null hypothesis being true for all comparisons simultaneously; alternatively that the null hypothesis is false for at least one test). Alternative approaches, such as the Benjamini–Hochberg procedure, are explicitly designed to control error rate when detecting individual "discoveries", and are thus more appropriate in many scenarios, such as the detection of differentially expressed genes in bioinformatics.
Statistical hypothesis testing is based on rejecting the null hypothesis if the likelihood of the observed data under the null hypotheses is low. If multiple hypotheses are tested, the chance of observing a rare event increases, and therefore, the likelihood of incorrectly rejecting a null hypothesis (i.e., making a Type I error) increases.
The Bonferroni correction compensates for that increase by testing each individual hypothesis at a significance level of , where is the desired overall alpha level and is the number of hypotheses. For example, if a trial is testing hypotheses with a desired , then the Bonferroni correction would test each individual hypothesis at . Likewise, when constructing multiple confidence intervals the same phenomenon appears.
Let be a family of hypotheses and their corresponding p-values. Let be the total number of null hypotheses, and let be the number of true null hypotheses (which is presumably unknown to the researcher). The familywise error rate (FWER) is the probability of rejecting at least one true , that is, of making at least one type I error. The Bonferroni correction rejects the null hypothesis for each , thereby controlling the FWER at . Proof of this control follows from Boole's inequality, as follows:
This control does not require any assumptions about dependence among the p-values or about how many of the null hypotheses are true.
Rather than testing each hypothesis at the level, the hypotheses may be tested at any other combination of levels that add up to , provided that the level of each test is decided before looking at the data. For example, for two hypothesis tests, an overall of 0.05 could be maintained by conducting one test at 0.04 and the other at 0.01.
The procedure proposed by Dunn can be used to adjust confidence intervals. If one establishes confidence intervals, and wishes to have an overall confidence level of , each individual confidence interval can be adjusted to the level of .
When searching for a signal in a continuous parameter space there can also be a problem of multiple comparisons, or look-elsewhere effect. For example, a physicist might be looking to discover a particle of unknown mass by considering a large range of masses; this was the case during the Nobel Prize winning detection of the Higgs boson. In such cases, one can apply a continuous generalization of the Bonferroni correction by employing Bayesian logic to relate the effective number of trials, , to the prior-to-posterior volume ratio.
There are alternative ways to control the familywise error rate. For example, the Holm–Bonferroni method and the Šidák correction are universally more powerful procedures than the Bonferroni correction, meaning that they are always at least as powerful. Unlike the Bonferroni procedure, these methods do not control the expected number of Type I errors per family (the per-family Type I error rate).
The correction comes at the cost of increasing the probability of producing false negatives, i.e., reducing statistical power. There is not a definitive consensus on how to define a family in all cases, and adjusted test results may vary depending on the number of tests included in the family of hypotheses. Such criticisms apply to FWER control in general, and are not specific to the Bonferroni correction.
- Bonferroni, C. E., Teoria statistica delle classi e calcolo delle probabilità, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 1936
- Dunn, Olive Jean (1961). "Multiple Comparisons Among Means" (PDF). Journal of the American Statistical Association. 56 (293): 52–64. CiteSeerX 10.1.1.309.1277. doi:10.1080/01621459.1961.10482090.
- Mittelhammer, Ron C.; Judge, George G.; Miller, Douglas J. (2000). Econometric Foundations. Cambridge University Press. pp. 73–74. ISBN 978-0-521-62394-0.
- Miller, Rupert G. (1966). Simultaneous Statistical Inference. Springer. ISBN 9781461381228.
- Goeman, Jelle J.; Solari, Aldo (2014). "Multiple Hypothesis Testing in Genomics". Statistics in Medicine. 33 (11): 1946–1978. doi:10.1002/sim.6082. PMID 24399688.
- Neuwald, AF; Green, P (1994). "Detecting patterns in protein sequences". J. Mol. Biol. 239 (5): 698–712. doi:10.1006/jmbi.1994.1407. PMID 8014990.
- Bayer, Adrian E.; Seljak, Uroš (2020). "The look-elsewhere effect from a unified Bayesian and frequentist perspective". Journal of Cosmology and Astroparticle Physics. 2020 (10): 009–009. arXiv:2007.13821. doi:10.1088/1475-7516/2020/10/009.
- Frane, Andrew (2015). "Are per-family Type I error rates relevant in social and behavioral science?". Journal of Modern Applied Statistical Methods. 14 (1): 12–23. doi:10.22237/jmasm/1430453040.
- Moran, Matthew (2003). "Arguments for rejecting the sequential Bonferroni in ecological studies". Oikos. 100 (2): 403–405. doi:10.1034/j.1600-0706.2003.12010.x.
- Nakagawa, Shinichi (2004). "A farewell to Bonferroni: the problems of low statistical power and publication bias". Behavioral Ecology. 15 (6): 1044–1045. doi:10.1093/beheco/arh107.