Studentized range

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, the studentized range is the difference between the largest and smallest data in a sample measured in units of sample standard deviations.

The studentized range, q, which was presented by Newman (1939) and Keuls (1952) and John Tukey in some unpublished notes, is the base statistic for the studentized range distribution, which is used for multiple comparison procedures, such as the single step procedure Tukey's range test and the Duncan's step down procedure and establishing confidence intervals that are still valid after data snooping has occurred.[1]

Description[edit]

The value of the studentized range is most often represented by the variable q.

The studentized range computed from a list x1, ..., xn of numbers is given by the formulas


q _{n,\nu}= \frac{\max\{\,x_1,\ \dots \ x_n\,\} - \min\{\,x_1,\ \dots\ x_n\}}{s} = \max_{i,j=1, \dots, n}\left\{\frac{x_i - x_j}{s}\right\}

where

 s^2 = \frac{1}{n - 1}\sum_{i=1}^n (x_i - \overline{x})^2,

is the sample variance, the square of the sample standard deviation s, and

 \overline{x} = \frac{x_1 + \cdots + x_n}{n}

is the sample mean.

The critical value of q based on three factors:

  1. α (the probability of rejecting a true null hypothesis)
  2. n (the number of observations or groups)
  3. v (degrees of freedom in the second sample)

If X1, ..., Xn are independent identically distributed random variables that are normally distributed, the probability distribution of their studentized range is what is usually called the studentized range distribution. This probability distribution is the same regardless of the expected value and standard deviation of the normal distribution from which the sample is drawn: tables are available.[2] This probability distribution has applications to hypothesis testing and multiple comparisons. For example, Tukey's range test and Duncan's new multiple range test (MRT), which uses q statistics, can be used as post-hoc analysis to test between which two groups there is a significant difference after rejecting null hypothesis by analysis of variance.[3]

When only two groups need to be compared, the studentized range distribution is similar to the Student's t distribution, differing only in that it takes into account the number of means under consideration. The more means under consideration, the larger the critical value is. This makes sense since the more means there are, the greater the likelihood that at least some differences between pairs of means will be large due to chance alone.

Studentized data[edit]

Generally, the term studentized means that the variable's scale was adjusted by dividing by an estimate of a population standard deviation (see also studentized residual).

The concept is named after William Sealey Gosset, who wrote under the pseudonym "Student". The fact that the standard deviation is a sample standard deviation rather than the population standard deviation, and thus something that differs from one random sample to the next, is essential to the definition.

The variability in the value of the sample standard deviation introduces additional uncertainty into the values calculated. This complicates the problem of finding the probability distribution of any statistic that is studentized.

See also[edit]

Notes[edit]

  1. ^ John A. Rafter (2002). "Multiple Comparison Methods for Means". SIAM 44 (2): 259–278. doi:10.1137/s0036144501357233. 
  2. ^ Pearson & Hartley (1970, Section 14, Table 29)
  3. ^ Pearson & Hartley (1970, Section 14.2)

References[edit]

  • Pearson, E.S.; Hartley, H.O. (1970) Biometrika Tables for Statisticians, Volume 1, 3rd Edition, Cambridge University Press. ISBN 0-521-05920-8

Further reading[edit]

  • John Neter, Michael H. Kutner, Christopher J. Nachtsheim, William Wasserman (1996) Applied Linear Statistical Models, fourth edition, McGraw-Hill, page 726.
  • John A. Rice (1995) Mathematical Statistics and Data Analysis, second edition, Duxbury Press, pages 451–452.