Bonferroni correction

In statistics, the Bonferroni correction is one of several methods used to counteract the problem of multiple comparisons. It is named after Italian mathematician Carlo Emilio Bonferroni for its use of Bonferroni inequalities,^[1] but modern usage is often credited to Olive Jean Dunn, who described the procedure in a pair of articles written in 1959 and 1961.^[2]^[3]

Introduction

Statistical hypothesis testing is based on rejecting the null hypothesis if the likelihood of the observed data under the null hypotheses is low. If multiple comparisons are done or multiple hypotheses are tested, the chance of a rare event increases, and therefore, the likelihood of incorrectly rejecting a null hypothesis (i.e., making a Type I error) increases.^[4]^{[better source needed]}

The Bonferroni correction is based on the idea that if an experimenter is testing $m$ hypotheses, then one way of maintaining the familywise error rate (FWER) is to test each individual hypothesis at a statistical significance level of $1/m$ times the desired maximum overall level.^{[citation needed]}

If the desired significance level for the whole family of tests is $\alpha$ , then the Bonferroni correction would test each individual hypothesis at a significance level of $\alpha /m$ .^{[citation needed]} For example, if a trial is testing $m=8$ hypotheses with a desired $\alpha =0.05$ , then the Bonferroni correction would test each individual hypothesis at $\alpha =0.05/8=0.00625$ .^{[citation needed]}

Definition

Let $H_{1},...,H_{m}$ be a family of hypotheses and $p_{1},...,p_{m}$ their corresponding p-values. The familywise error rate (FWER) is the probability of rejecting at least one true $H_{i}$ ; that is, to make at least one type I error. The Bonferroni correction states that rejecting the null hypothesis for all $p_{i}\leq {\frac {\alpha }{m}}$ controls the FWER. The proof follows from Boole's inequality:

FWER=P\left\{\bigcup _{i=1}^{m_{0}}\left(p_{i}\leq {\frac {\alpha }{m}}\right)\right\}\leq \sum _{i=1}^{m_{0}}\left\{P\left(p_{i}\leq {\frac {\alpha }{m}}\right)\right\}\leq m_{0}{\frac {\alpha }{m}}\leq m{\frac {\alpha }{m}}=\alpha

This control does not require any assumptions about dependence among the p-values.^[5]

Extensions

Generalization

Rather than testing each hypothesis at the $\alpha /m$ level, the hypotheses may be tested at any combination of levels that add up to $\alpha$ , provided that the level of each specific test is determined before looking at the data.^{[citation needed]} For example, for two hypothesis tests, an overall $\alpha$ of .05 could be maintained by conducting one test at .04 and the other at .01.

Confidence intervals

The Bonferroni correction can be used to adjust confidence intervals. If one establishes $m$ confidence intervals, and wishes to have overall confidence level of $1-\alpha$ , each individual confidence interval can be adjusted to the level of $1-{\frac {\alpha }{m}}$ .^{[citation needed]}

Alternatives

There are alternatives to control the familywise error rate. For example, the Holm–Bonferroni method and the Šidák correction are universally more powerful procedures than the Bonferroni correction, meaning that they are always at least as powerful. Unlike the Bonferroni procedure, these methods do not control the expected number of Type I errors per family (the per-family Type I error rate).^[6]

Criticism

The Bonferroni correction can be conservative if there are a large number of tests and/or the test statistics are positively correlated. The correction comes at the cost of increasing the probability of producing false negatives, and consequently reducing statistical power.^{[citation needed]}

Another criticism concerns the concept of a family of hypotheses. There is no definitive consensus on how to define a family in all cases. As there is no standard definition, test results may change dramatically, only by modifying the way one considers the hypotheses families.^{[citation needed]}

These two criticisms, apply to adjustments for multiple comparisons in general, and are not specific to the Bonferroni correction.^{[citation needed]}

Kenneth Rothman has said that if statistical tests are only performed when there is a strong reason to expect the result to be true, multiple comparisons adjustments are not necessary.^[7]

It has also been argued that using multiple testing corrections is an inefficient way of empirical research, since they control false positives at the potential expense of many more false negatives.^{[citation needed]}

On the other hand, it has been argued that advances in measurement and information technology have made it far easier to generate large datasets for exploratory analysis, often leading to the testing of large numbers of hypotheses with no prior basis for expecting many of the hypotheses to be true.^[8] In this situation, very high false positive rates are expected unless multiple comparisons adjustments are made.^[8]

References

^ Bonferroni, C. E., Teoria statistica delle classi e calcolo delle probabilità, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 1936
^ Dunn, Olive Jean (1959). "Estimation of the Medians for Dependent Variables". Annals of Mathematical Statistics. 30 (1): 192–197. doi:10.1214/aoms/1177706374. JSTOR 2237135.
^ Dunn, Olive Jean (1961). "Multiple Comparisons Among Means" (PDF). Journal of the American Statistical Association. 56 (293): 52–64. doi:10.1080/01621459.1961.10482090.
^ Mittelhammer, Ron C.; Judge, George G.; Miller, Douglas J. (2000). Econometric Foundations. Cambridge University Press. pp. 73–74. ISBN 0-521-62394-4.
^ Goeman, Jelle J.; Solari, Aldo (2014). "Multiple Hypothesis Testing in Genomics". Statistics in Medicine. 33 (11). doi:10.1002/sim.6082.
^ Frane, Andrew (2015). "Are per-family Type I error rates relevant in social and behavioral science?". Journal of Modern Applied Statistical Methods. 14 (1): 12–23.
^ Rothman, Kenneth J. (1990). "No Adjustments Are Needed for Multiple Comparisons". Epidemiology. 1 (1). Lippincott Williams & Wilkins: 43–46. doi:10.1097/00001648-199001000-00010. JSTOR 20065622. PMID 2081237.
^ ^a ^b Ioannidis, JPA (2005). "Why Most Published Research Findings Are False". PLoS Med. 2 (8): e124. doi:10.1371/journal.pmed.0020124. PMC 1182327. PMID 16060722.{{cite journal}}: CS1 maint: unflagged free DOI (link)

External links

"Bonferroni". webstat.une.edu.au. School of Psychology, University of New England, New South Wales, Australia. 2000. Retrieved 2016-02-03.
Weisstein, Eric W. "Bonferroni correction". MathWorld.
Bonferroni, Sidak online calculator
Explanation of p-value correction methods under the context of differential gene expression analysis

[1] Bonferroni, C. E., Teoria statistica delle classi e calcolo delle probabilità, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 1936

[2] Dunn, Olive Jean (1959). "Estimation of the Medians for Dependent Variables". Annals of Mathematical Statistics. 30 (1): 192–197. doi:10.1214/aoms/1177706374. JSTOR 2237135.

[3] Dunn, Olive Jean (1961). "Multiple Comparisons Among Means" (PDF). Journal of the American Statistical Association. 56 (293): 52–64. doi:10.1080/01621459.1961.10482090.

[4] Mittelhammer, Ron C.; Judge, George G.; Miller, Douglas J. (2000). Econometric Foundations. Cambridge University Press. pp. 73–74. ISBN 0-521-62394-4.

[5] Goeman, Jelle J.; Solari, Aldo (2014). "Multiple Hypothesis Testing in Genomics". Statistics in Medicine. 33 (11). doi:10.1002/sim.6082.

[6] Frane, Andrew (2015). "Are per-family Type I error rates relevant in social and behavioral science?". Journal of Modern Applied Statistical Methods. 14 (1): 12–23.

[7] Rothman, Kenneth J. (1990). "No Adjustments Are Needed for Multiple Comparisons". Epidemiology. 1 (1). Lippincott Williams & Wilkins: 43–46. doi:10.1097/00001648-199001000-00010. JSTOR 20065622. PMID 2081237.

[ioan-8] Ioannidis, JPA (2005). "Why Most Published Research Findings Are False". PLoS Med. 2 (8): e124. doi:10.1371/journal.pmed.0020124. PMC 1182327. PMID 16060722.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]