The experiment asked whether a taster could tell if the milk was added before the brewed tea, when preparing a cup of tea
Ronald Fisher in 1913

In the design of experiments in statistics, the lady tasting tea is a randomized experiment devised by Ronald Fisher and reported in his book The Design of Experiments (1935).[1] The experiment is the original exposition of Fisher's notion of a null hypothesis, which is "never proved or established, but is possibly disproved, in the course of experimentation".[2][3]

The lady in question (Muriel Bristol) claimed to be able to tell whether the tea or the milk was added first to a cup. Fisher proposed to give her eight cups, four of each variety, in random order. One could then ask what the probability was for her getting the specific number of cups she identified correct, but just by chance.

Fisher's description is less than 10 pages in length and is notable for its simplicity and completeness regarding terminology, calculations and design of the experiment.[4] The example is loosely based on an event in Fisher's life. The test used was Fisher's exact test.

The experiment

The experiment provided the lady with 8 randomly ordered cups of tea—4 prepared by first adding milk, 4 prepared by first adding the tea. She was to select the 4 cups prepared by one method. This offered the lady the advantage of judging cups by comparison. She was fully informed of the experimental method.

The null hypothesis was that the lady had no ability to distinguish the teas. In Fisher's approach, there is no alternative hypothesis,[2] unlike in the Neyman–Pearson approach.

The test statistic was a simple count of the number of successes in selecting the 4 cups (the number of cups of the given type successfully selected). The distribution of possible numbers of successes, assuming the null hypothesis was true, was computed using the number of permutations. Using the combination formula, with ${\displaystyle n=8}$ total cups and ${\displaystyle k=4}$ cups chosen, there are ${\displaystyle {\frac {8!}{4!(8-4)!}}=70}$ possible combinations.

Tea-Tasting Distribution Assuming the Null Hypothesis
Success count Permutations of selection Number of permutations
0 oooo 1 × 1 = 1
1 ooox, ooxo, oxoo, xooo 4 × 4 = 16
2 ooxx, oxox, oxxo, xoxo, xxoo, xoox 6 × 6 = 36
3 oxxx, xoxx, xxox, xxxo 4 × 4 = 16
4 xxxx 1 × 1 = 1
Total 70

The frequencies of the possible numbers of successes, given in the final column of this table, are derived as follows. For 0 successes, there is clearly only one set of four choices (namely, choosing all four incorrect cups) giving this result. For one success and three failures, there are four correct cups of which one is selected, which by the combination formula can occur in ${\displaystyle {\binom {4}{1}}=4}$ different ways (as shown in column 2, with x denoting a correct cup that is chosen and o denoting a correct cup that is not chosen); and independently of that, there are four incorrect cups of which three are selected, which can occur in ${\displaystyle {\binom {4}{3}}=4}$ ways (as shown in the second column, this time with x interpreted as an incorrect cup which is not chosen, and o indicating an incorrect cup which is chosen). Thus a selection of any one correct cup and any three incorrect cups can occur in any of 4×4 = 16 ways. The frequencies of the other possible numbers of successes are calculated correspondingly. Thus the number of successes is distributed according to the hypergeometric distribution.

The critical region for rejection of the null of no ability to distinguish was the single case of 4 successes of 4 possible, based on the conventional probability criterion < 5%. This is the critical region because under the null of no ability to distinguish, 4 successes has 1 chance out of 70 (≈ 1.4% < 5%) of occurring, whereas at least 3 of 4 successes has a probability of (16+1)/70 (≈ 24.3% > 5%).

Thus, if and only if the lady properly categorized all 8 cups was Fisher willing to reject the null hypothesis – effectively acknowledging the lady's ability at a 1.4% significance level (but without quantifying her ability). Fisher later discussed the benefits of more trials and repeated tests.

David Salsburg reports that a colleague of Fisher, H. Fairfield Smith, revealed that in the test, the woman got all eight cups correct.[5][6] The chance of someone who just guesses getting all correct, assuming she guesses that four had the tea put in first and four the milk, would be only 1 in 70 (the combinations of 8 taken 4 at a time).

In popular science, Salsburg published a book entitled The Lady Tasting Tea,[5] which describes Fisher's experiment and ideas on randomization. Deb Basu wrote that “the famous case of the ‘lady tasting tea’” was “one of the two supporting pillars … of the randomization analysis of experimental data.”[7]