Binomial test: Difference between revisions

Content deleted Content added

Inline

Revision as of 00:20, 8 May 2012

In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories.

Common use

The most common use of the binomial test is in the case where the null hypothesis is that two categories are equally likely to occur (such as a coin toss). Tables are widely available to give the significance observed numbers of observations in the categories for this case. However, as the example below shows, the binomial test is not restricted to this case.

Where there are more than two categories, and an exact test is required, the multinomial test, based on the multinomial distribution, must be used instead of the binomial test.^[1]

Large samples

For large samples such as the example below, the binomial distribution is well approximated by convenient continuous distributions, and these are used as the basis for alternative tests that are much quicker to compute, Pearson's chi-squared test and the G-test. However, for small samples these approximations break down, and there is no alternative to the binomial test.

Example binomial test

Suppose we have a board game that depends on the roll of a die and attaches special importance to rolling a 6. In a particular game, the die is rolled 235 times, and 6 comes up 51 times. If the die is fair, we would expect 6 to come up 235/6 = 39.17 times. Is the proportion of 6s significantly higher than would be expected by chance, on the null hypothesis of a fair die?

To find an answer to this question using the binomial test, we consult the binomial distribution B(235,1/6) to determine the probability of finding exactly 51 sixes in a sample of 235 if the true probability of rolling a 6 on each trial is 1/6. We then find the probability of finding exactly 52, exactly 53, and so on up to 235, and add all these probabilities together. In this way, we calculate the probability of obtaining the observed result (51 6s) or a more extreme result (>51 6s) assuming that the die is fair. In this example, the result is 0.0265443, which indicates that observing 51 6s is unlikely (significant at the 5% level) to come from a die that is not loaded to give many 6s (one-tailed test).

Clearly a die could roll too few sixes as easily as too many and we would be just as suspicious, so we should use the two-tailed test which (for example) splits the 5% probability across the two tails.

In statistical software packages

Binomial tests are available in most software used for statistical purposes. E.g.

In R the above example could be calculated with the following code:
- binom.test(51,235,(1/6),alternative="greater") (one-tailed test)
- binom.test(51,235,(1/6),alternative="two.sided") (two-tailed test)
In SAS the test is available in the Frequency procedure

PROC FREQ DATA=DiceRoll ;
	TABLES Roll / BINOMIAL (P=0.166667) ALPHA=0.05 ;
	EXACT  BINOMIAL ;
	WEIGHT Freq ;
RUN;

In SPSS the test can be utilized through the menu Analyze > Nonparametric test > Binomial
In Python, use SciPy:
- scipy.stats.binom.sf(51-1, 235, 1.0/6) # -1 is there to include 51 as well ;-) (one-tailed test)
- scipy.stats.binom_test(51, 235, 1.0/6) (two-tailed test)
In MATLAB, use binofit:
- [phat,pci]=binofit(51, 235,0.05) (generally two-tailed, one-tailed for the extreme cases "0 out of n" and "n out of n"). You will get back the probability for the dice to roll a six (phat) as well as the confidence interval (pci) for the confidence level of 95% = (1-0.05), respectively a significance of 5%.

References

^ Howell, D. C. (2007). Statistical Methods for Psychology (6th ed.). Belmont, CA: Thomson Higher Education.

Binomial significance testing Retrieved 03-07-2009

[1] Howell, D. C. (2007). Statistical Methods for Psychology (6th ed.). Belmont, CA: Thomson Higher Education.

[1]

@@ Line 14: / Line 14: @@
 To find an answer to this question using the binomial test, we consult the [[binomial distribution]] ''B''(235,1/6) to determine the probability of finding exactly 51 sixes in a sample of 235 if the true probability of rolling a 6 on each trial is 1/6.  We then find the probability of finding exactly 52, exactly 53, and so on up to 235, and add all these probabilities together. In this way, we calculate the probability of obtaining the observed result (51 6s) or a more extreme result (>51 6s) assuming that the die is fair.  In this example, the result is 0.0265443, which indicates that observing 51 6s is unlikely (significant at the 5% level) to come from a die that is not loaded to give many 6s ([[one-tailed test]]).
+Clearly a die could roll too few sixes as easily as too many and we would be just as suspicious, so we should use the [[two-tailed test]] which (for example) splits the 5% probability across the two tails.
-Clearly a die could roll too few sixes as easily as too many and we would be just as suspicious, so we should use the [[two-tailed test]] which considers the probability of having a particular effect size either above or below expectation.  Here the effect size is 11.83, since that is how many more sixes there were than expected, with 51 found vs. 39.17 expected.  So now we have to find the probability that the die would roll a six 27 times or fewer (39.17 expected - 11.83 equal effect size) [arguable, see discussion].  Summing over all the probabilities (< 28 6s) yields .0172037.  When we add this to the first result, we get .0437480, which is significant at the 5% [[significance level]].  If the cost of a false accusation was too high, we might have a more stringent requirement, like 1% significance level, in which case we could not reject the null hypothesis of a fair die with sufficient certainty.
 ==See also==