Tukey's range test

Tukey's test, also known as the Tukey range test, Tukey method, Tukey's honest significance test, Tukey's HSD (Honestly Significant Difference) test^[1], or the Tukey–Kramer method, is a single-step multiple comparison procedure and statistical test generally used in conjunction with an ANOVA to find which means are significantly different from one another. Named after John Tukey, it compares all possible pairs of means, and is based on a studentized range distribution q (this distribution is similar to the distribution of t from the t-test).^[2]

The test compares the means of every treatment to the means of every other treatment; that is, it applies simultaneously to the set of all pairwise comparisons

\mu _{i}-\mu _{j}\,

and identifies where the difference between two means is greater than the standard error would be expected to allow. The confidence coefficient for the set, when all sample sizes are equal, is exactly 1 − α. For unequal sample sizes, the confidence coefficient is greater than 1 − α. In other words, the Tukey method is conservative when there are unequal sample sizes.

Assumptions of Tukey's test

The observations being tested are independent
The means are from normally distributed populations
There is equal variation across observations. (homoscedasticity)

The test statistic

Tukey's test is based on a formula very similar to that of the t-test. In fact, Tukey's test is essentially a t-test, except that it corrects for experiment-wise error rate (when there are multiple comparisons being made, the probability of making a type I error increases — Tukey's test corrects for that, and is thus more suitable for multiple comparisons than doing a number of t-tests would be).^[2]

The formula for Tukey's test is:

q_{s}={\frac {Y_{A}-Y_{B}}{SE}},

where Y_A is the larger of the two means being compared, Y_B is the smaller of the two means being compared, and SE is the standard error of the data in question.

This q_s value can then be compared to a q value from the studentized range distribution. If the q_s value is larger than the q_critical value obtained from the distribution, the two means are said to be significantly different.^[2]

Since the null hypothesis for Tukey's test states that all means being compared are from the same population (ie. μ₁ = μ₂ = μ₃ = ... = μ_n), the means should be normally distributed (according to the central limit theorem). This gives rise to the normality assumption of Tukey's test.

Confidence limits

The Tukey confidence limits for all pairwise comparisons with confidence coefficient of at least 1 − α are

{\bar {y}}_{i\bullet }-{\bar {y}}_{j\bullet }\pm {\frac {q_{\alpha ;r;N-r}}{\sqrt {2}}}{\widehat {\sigma }}_{\varepsilon }{\sqrt {\frac {2}{n}}}\qquad i,j=1,\ldots ,r\quad i\neq j.

Notice that the point estimator and the estimated variance are the same as those for a single pairwise comparison. The only difference between the confidence limits for simultaneous comparisons and those for a single comparison is the multiple of the estimated standard deviation.

Also note that the sample sizes must be equal when using the studentized range approach. ${\widehat {\sigma }}_{\varepsilon }$ is the standard deviation of the entire design, not just that of the two groups being compared. The Tukey–Kramer method for unequal sample sizes is as follows:

{\bar {y}}_{i\bullet }-{\bar {y}}_{j\bullet }\pm {\frac {q_{\alpha ;r;N-r}}{\sqrt {2}}}{\widehat {\sigma }}_{\varepsilon }{\sqrt {{\frac {1}{n}}_{i}+{\frac {1}{n}}_{j}}}\qquad

where n_i and n_j are the sizes of groups i and j respectively. The degrees of freedom for the whole design is also applied.

The studentized range (q) distribution

The Tukey method uses the studentized range distribution. Suppose we have r independent observations y₁, ..., y_r from a normal distribution with mean μ and variance σ². Let w be the range for this set; i.e., the maximum minus the minimum. Now suppose that we have an estimate s² of the variance σ² which is based on ν degrees of freedom and is independent of the y_i (i = 1,...,r). The studentized range is defined as

q_{r,\nu }=w/s.\,

Tukey's test is based on the comparison of two samples from the same population. From the first sample, the range (calculated by subtracting the smallest observation from the largest, or $\scriptstyle {\text{range}}\,=\,\max _{i}(Y_{i})\,-\,\min _{i}(Y_{i})$ , where Y_i represents all of the observations) is calculated, and from the second sample, the standard deviation is calculated. The studentized range ratio is then calculated:

q={\frac {\text{range}}{s}},

where q = studentized range, and s = standard deviation of the second sample.

This value of q is the basis of the critical value of q, based on three factors:

α (the Type I error rate, or the probability of rejecting a true null hypothesis)
n (the number of degrees of freedom in the first sample (the one from which range was calculated))
v (the number of degrees of freedom in the second sample (the one from which s was calculated))

The distribution of q has been tabulated and appears in many textbooks on statistics. In addition, R offers a cumulative distribution function (ptukey) and a quantile function (qtukey) for q.

Order of comparisons

If there are a set of means (A, B, C, D), which can be ranked in the order A > B > C > D, not all possible comparisons need be tested using Tukey's test. To avoid redundancy, one starts by comparing the largest mean (A) with the smallest mean (D). If the q_s value for the comparison of means A and D is less than the q value from the distribution, the null hypothesis is accepted, and the means are said have no statistically significant difference between them. Since there is no difference between the two means that have the largest difference, comparing any two means that have a smaller difference is assured to yield the same conclusion. As a result, no other comparisons need to be made.^[2]

Overall, it is important when doing a Tukey's test to always start by comparing the largest mean to the smallest mean, and then the largest mean with the next smallest, etc., until the largest mean has been compared to all other means (or until no difference is found). After this, compare the second largest mean with the smallest mean, and then the next smallest, and so on. Once again, if two means are found to have no statistically significant difference, do not compare any of the means between them.^[2]

Unequal sample sizes

It is possible to work with unequal sample sizes. In this case, one has to calculate the estimated standard deviation for each pairwise comparison as formalized by Clyde Kramer in 1956, so the procedure for unequal sample sizes is sometimes referred to as the Tukey–Kramer method.

Comparison with Scheffé's method

If only pairwise comparisons are to be made, the Tukey–Kramer method will result in a narrower confidence limit, which is preferable. In the general case when many or all contrasts might be of interest, Scheffé's method tends to give narrower confidence limits and is therefore the preferred method.

Notes

^ Lowry, Richard. One Way ANOVA – Independent Samples. Vassar.edu. Retrieved on December 4th, 2008
^ ^a ^b ^c ^d ^e Linton, L.R., Harder, L.D. (2007) Biology 315 – Quantitative Biology Lecture Notes. University of Calgary, Calgary, AB

This article incorporates public domain material from the National Institute of Standards and Technology

External links

NIST/SEMATECH e-Handbook of Statistical Methods: Tukey's method

[Vassar-1] Lowry, Richard. One Way ANOVA – Independent Samples. Vassar.edu. Retrieved on December 4th, 2008

[Calgary-2] Linton, L.R., Harder, L.D. (2007) Biology 315 – Quantitative Biology Lecture Notes. University of Calgary, Calgary, AB

[1]

[2]