Barnard's test

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In statistics, Barnard’s test is an exact test used in the analysis of 2 × 2 contingency tables with one margin fixed. Barnard’s tests are really a class of hypothesis tests, also known as unconditional exact tests for two independent binomials.[1][2][3] These tests examine the association of two categorical variables and are often a more powerful alternative than Fisher's exact test for 2 × 2 contingency tables. While first published in 1945 by G.A. Barnard,[4][5] the test did not gain popularity due to the computational difficulty of calculating the p value and Fisher’s specious disapproval. Nowadays, for small / moderate sample sizes (  n  ≲ 1 000  ), computers can often implement Barnard’s test in a few seconds.

Purpose and scope[edit]

Barnard’s test is used to test the independence of rows and columns in a 2 × 2 contingency table. The test assumes each response is independent. Under independence, there are three types of study designs that yield a 2 × 2 table, and Barnard's test applies to the second type.

To distinguish the different types of designs, suppose a researcher is interested in testing whether a treatment quickly heals an infection.

  1. One possible study design would be to sample 100 infected subjects, and for each subject see if they got the novel treatment or the old, standard, medicine, and see if the infection is still present after a set time. This type of design is common in cross-sectional studies, or ‘field observations’ such as epidemiology.
  2. Another possible study design would be to give 50 infected subjects the treatment, 50 infected subjects the placebo, and see if the infection is still present after a set time. This type of design is common in clinical trials.
  3. The final possible study design would be to give 50 infected subjects the treatment, 50 infected subjects the placebo, and stop the experiment once a pre-determined number of subjects has healed from the infection. This type of design is rare, but has the same structure as the lady tasting tea study that led R.A. Fisher to create Fisher's exact test.

Although the results of each design of experiment can be laid out in nearly identical-appearing 2 × 2 tables, their statistics are different, and hence the criteria for a "significant" result are different for each:

  1. The probability of a 2 × 2 table under the first study design is given by the multinomial distribution; where the total number of samples taken is the only statistical constraint. This is a form of uncontrolled experiment, or "field observation", where experimenter simply "takes the data as it comes".[a]
  2. The second study design is given by the product of two independent binomial distributions; the totals in one of the margins (either the row totals or the column totals) are constrained by the experimental design, but the totals in other margin are free. This is by far the most common form of experimental design, where the experimenter constrains part of the experiment, say by assigning half of the subjects to be provided with a new medicine and the other half to receive an older, conventional medicine, but has no control over the numbers of individuals in each controlled category who either recover or succumb to the illness.
  3. The third design is given by the hypergeometric distribution; where both the total numbers in each column and row are constrained. For example an individual is allowed to taste 8 cups of soda, but must assign four to each category "brand X" and "brand Y", so that both the row totals and the column totals are constrained to four.[b] This kind of experiment is complicated to manage, and is almost unknown in practical experiments.

The operational difference between Barnard’s exact test and Fisher’s ‘exact’ test is how they handle the nuisance parameter(s) of the common success probability, when calculating the p value. Fisher's exact test avoids estimating the nuisance parameter(s) by falsely conditioning on both margins, an approximately ancillary statistic that unrealistically constrains the possible outcomes. Barnard’s test considers all legitimate possible values of the nuisance parameter(s) and chooses the value(s) that maximizes the p value. The theoretical difference between the tests is that Barnard’s test genuinely is indeed exact for the double-binomially distributed tables for which it is used, whereas Fisher’s test is exact for hypergeometrically distributed taste-test tables, but not exact for the binomial tables, to which it is most often applied.

Both tests have sizes less than or equal to the type I error rate. However, Barnard’s test can be more powerful than Fisher’s test because it considers more ‘as or more extreme’ tables, by not conditioning on the second margin, which the procedure for Fisher’s test erroneously ignores. In fact, one variant of Barnard’s test, called Boschloo's test, is uniformly more powerful than Fisher’s test.[6] A more detailed description of Barnard’s test is given by Mehta and Senchaudhuri (2003).[7] Barnard’s test has been used alongside Fisher's exact test in project management research[8]

Criticisms[edit]

Under specious pressure from Fisher, Barnard retracted his test in a published paper,[9] however many researchers prefer Barnard’s exact test over Fisher's ‘exact’ test for analyzing 2 × 2 contingency tables, since its statistics are genuinely ‘exact’ for the vast majority of experimental designs, whereas Fisher’s ‘exact’ test statistics are wrong: The significance shown by its p values are too high, leading the experimenter to dismiss as insignificant results that are statistically significant using the correct double-binomial statistics rather than Fisher’s (usually) incorrect hypergeometric statistics. The only exception is when the rare case of an experimental design that constrains both marginal results (e.g. ‘taste tests’); although rare, experimentally imposed constraints on both marginal totals makes the true sampling distribution for the table is hypergeometric, and renders the p value from Fisher’s test statistically valid.

Barnard's test can be applied to larger tables, but the computation time increases and the power advantage quickly decreases.[10] It remains unclear which test statistic is preferred when implementing Barnard's test; however, most test statistics yield uniformly more powerful tests than Fisher's exact test.[11]

See also[edit]

Footnotes[edit]

  1. ^ For "field observations" of multinomially distributed data the chi-squared test is most commonly used methods of analysis; it produces "statistically correct" results, but is based on a normal approximation rather than exact statistics. Other methods also apply, and are discussed in the article on Pearson's chi-squared test.
  2. ^ The experimental result is only revealed in the interior of the table, with the count of the number of cups either correctly or incorrectly identified.

References[edit]

  1. ^ Mehrotra, D.V.; Chan, I.S.F.; Berger, R.L. (2003). "A cautionary note on exact unconditional inference for a difference between two independent binomial proportions". Biometrics. 59: 441–450.
  2. ^ Ripamonti, E.; Lloyd, C.; Quatto, P. (2017). "Contemporary frequentist views of the 2 × 2 binomial trial". Statistical Science. 32: 600–615. doi:10.1214/17-STS627.
  3. ^ Fay, M.P.; Hunsberger, S.A. (2021). "Practical valid inferences for the two-sample binomial problem". Statistics Surveys. 15. doi:10.1214/21-SS131.
  4. ^ Barnard, G.A. (1945). "A new test for 2 × 2 tables". Nature. 156 (3954): 177. doi:10.1038/156177a0. S2CID 186244479.
  5. ^ Barnard, G.A. (1947). "Significance tests for 2 × 2 tables". Biometrika. 34 (1–2): 123–138. doi:10.1093/biomet/34.1-2.123. PMID 20287826.
  6. ^ Boschloo, R.D. (1970). "Raised conditional level of significance for the 2 × 2 table when testing the equality of two probabilities". Statistica Neerlandica. 24: 1–35. doi:10.1111/j.1467-9574.1970.tb00104.x.
  7. ^ Mehta, C.R.; Senchaudhuri, P. (2003). "Conditional versus unconditional exact tests for comparing two binomials". Cite journal requires |journal= (help)
  8. ^ Invernizzi, Diletta Colette; Locatelli, Giorgio; Brookes, Naomi J. (1 January 2019). "An exploration of the relationship between nuclear decommissioning projects characteristics and cost performance" (PDF). Progress in Nuclear Energy. 110: 129–141. doi:10.1016/j.pnucene.2018.09.011. ISSN 0149-1970.
  9. ^ Barnard, G.A. (1949). "Statistical Inference". Journal of the Royal Statistical Society. Series B. 11 (2): 115–149.
  10. ^ Mehta, C.R.; Hilton, J.F. (1993). "Exact power of conditional and unconditional tests: Going beyond the 2 × 2 contingency table". The American Statistician. 47 (2): 91–98. doi:10.1080/00031305.1993.10475946.
  11. ^ Berger, R.L. (1994). "Power comparison of exact unconditional tests for comparing two binomial proportions". Institute of Statistics. Mimeo Series. No. 2266: 1–19.

External links[edit]