Bonferroni correction

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, the Bonferroni correction is a method used to counteract the problem of multiple comparisons. It is considered the simplest and most conservative method to control the familywise error rate.

It is named after Italian mathematician Carlo Emilio Bonferroni for the use of Bonferroni inequalities, but modern usage is credited to Olive Jean Dunn, who first used it in a pair of articles written in 1959 and 1961.

Contents

Informal introduction [edit]

Statistical inference logic is based on rejecting the null hypotheses if the likelihood under the null hypotheses of the observed data is low. The problem of multiplicity arises from the fact that as we increase the number of hypotheses in a test, we also increase the likelihood of witnessing a rare event, and therefore, the chance to reject the null hypotheses when it's true (type I error). Bonferroni correction is the most naive way to address this issue. The correction is based on the idea that if an experimenter is testing n dependent or independent hypotheses on a set of data, then one way of maintaining the familywise error rate (FWER) is to test each individual hypothesis at a statistical significance level of 1/n times what it would be if only one hypothesis were tested. So, if it is desired that the significance level for the whole family of tests should be (at most) α, then the Bonferroni correction would be to test each of the individual tests at a significance level of α/n. Statistically significant simply means that a given result is unlikely to have occurred by chance assuming the null hypothesis is actually correct (i.e., no difference among groups, no effect of treatment, no relation among variables).

Definition [edit]

Let H_{1},...,H_{m} be a family of hypotheses and p_{1},...,p_{m} the corresponding p-values. Let I_{0} be the subset of the (unknown) true null hypotheses, having m_{0} members.

The familywise error rate is the probability of rejecting at least one of the members in I_{0}, that is to make one or more type I error. The Bonferroni Correction states that rejecting all p_{i}<\frac{\alpha}{m} will control the FWER\leq\alpha. The proof follows from Boole's inequality: FWER=Pr\left\{ \bigcup_{i_{o}}(p_{i}\leq\frac{\alpha}{m})\right\} \leq\sum_{i_{o}}\{Pr(p_{i}\leq\frac{\alpha}{m})\}\leq m_{0}\frac{\alpha}{m}\leq m\frac{\alpha}{m}=\alpha

This result does not require that the tests be independent.

Modifications [edit]

Generalization [edit]

We have used the fact that \sum_{i=1}^{n}\frac{\alpha}{n}=\alpha, but the correction can be generalized and applied to any \sum_{i=1}^{n}a_{i}=\alpha, as long as the weights are defined prior to the test.

Confidence intervals [edit]

Bonferroni correction can be used to adjust confidence intervals. If we are forming m confidence intervals, and wish to have overall confidence level of 1-\alpha, then adjusting each individual confidence interval to the level of 1-\frac{\alpha}{m} will be the analog confidence interval correction.

Simultaneous inference and selective inference [edit]

Bonferroni correction is the basic type of simultaneous inference, that aims to control the familywise error rate. A significant statistical research was done in the field from early 60's until late 90's, and many improvements were offered. Most notably are the Holm-Bonferroni method, which offers a uniformly more powerful test procedure (i.e., more powerful regardless of the values of the unobservable parameters), and the Hochberg (1988) method, guaranteed to be no less powerful and is in many cases more powerful when the tests are independent (and also under some forms of positive dependence).

In 1995 Benjamini and Hochberg suggested to control the false discovery rate instead of the familywise error rate and do selective inference corrections. This approach address the technological improvements that occurred at the end of the century, and provide the researcher with better tools to do large-scale inferences, which was considered one of the weak points of simultaneous inferences methodologies.

Alternatives [edit]

This list only include some of the alternatives that control the familywise error rate.

Holm-Bonferroni method [edit]

A uniformly more powerful test procedure (i.e., more powerful regardless of the values of the unobservable parameters) is the Holm–Bonferroni method.

Šidák correction [edit]

A related correction, called the Šidák correction (or Dunn-Šidák correction) that is often used is[clarification needed] 1 - (1 - \alpha)^{1/n}.

This correction is often confused with the Bonferroni correction. The Šidák correction is derived by assuming that the individual tests are independent. Let the significance threshold for each test be \beta; then the probability that at least one of the tests is significant under this threshold is (1 - the probability that none of them are significant). Since it is assumed that they are independent, the probability that all of them are not significant is the product of the probabilities that each of them are not significant, or 1 - (1 - \beta)^n. Our intention is for this probability to equal \alpha, the significance level for the entire series of tests. By solving for \beta, we obtain \beta = 1 - (1 - \alpha)^{1/n}.

For example, to test two independent hypotheses on the same data at 0.05 significance level, instead of using a p value threshold of 0.05, one would use a stricter threshold equal to 1-\sqrt{0.95}\approx 0.025. Notably one can derive valid confidence intervals matching the test decision using the Šidák correction by using 100(1 − α1/n)% confidence intervals.

The Bonferroni correction is a safeguard against multiple tests of statistical significance on the same data falsely giving the appearance of significance, as 1 out of every 20 hypothesis-tests is expected to be significant at the α = 0.05 level purely due to chance. Furthermore, the probability of getting a significant result with n tests at this level of significance is 1 − 0.95n (1 − probability of not getting a significant result with n tests).

The Šidák correction gives a stronger bound than the Bonferroni correction, because, for n \ge 1, \alpha / n \le 1 - (1 - \alpha)^{1/n}. But the Šidák correction requires the additional condition of independence. Previously, because the Šidák correction requires fractional powers (i.e. roots), the computationally simpler Bonferroni correction was often preferred instead. Now, inasmuch as computing fractional powers is trivial, preference of the Bonferroni method is due in part to tradition or unfamiliarity with the Šidák method. Additionally, the results of the two methods are highly similar for conventional significance levels (between .01 and .10).

False discovery rate [edit]

A less restrictive criterion that does not control the familywise error rate is the approximate false discovery rate that does not require ordering the p-values, then using different criteria for each test.

Criticisms [edit]

The Bonferroni correction can be somewhat conservative if there are a large number of tests and/or the test statistics are positively correlated. Bonferroni correction controls the probability of false positives only. The correction ordinarily comes at the cost of increasing the probability of producing false negatives, and consequently reducing statistical power. When testing a large number of hypotheses, this can result in large critical values.

Another criticism concerns the concept of a family of hypotheses. The statistical community has not yet reached a consensus on how to define such a family. Currently it is defined subjectively per test. As there is no standard definition, test results may change dramatically, only by modifying the way we consider the hypotheses families.

In addition, in certain situations where one wants to retain, not reject, the null hypothesis, then Bonferroni correction is non-conservative.

See also [edit]

References [edit]

External links [edit]