Range (statistics)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Not to be confused with Mid-range.
This article is about the range in statistics. For the range as it pertains to functions, see range (mathematics).

In arithmetic, the range of a set of data is the difference between the largest and smallest values.[1]

However, in descriptive statistics, this concept of range has a more complex meaning. The range is the size of the smallest interval which contains all the data and provides an indication of statistical dispersion. It is measured in the same units as the data. Since it only depends on two of the observations, it is most useful in representing the dispersion of small data sets.[2]

Independent identically distributed continuous random variables[edit]

For n independent and identically distributed continuous random variables X1, X2, ..., Xn with cumulative distribution function G(x) and probability density function g(x) the range of the Xi is the range of a sample of size n from a population with distribution function G(x).

Distribution[edit]

The range has cumulative distribution function[3][4]

F(t)= n \int_{-\infty}^{\infty} g(x)[G(x+t)-G(x)]^{n-1}\text{d}x.

Gumbel notes that the "beauty of this formula is completely marred by the facts that, in general, we cannot express G(x + t) by G(x), and that the numerical integration is lengthy and tiresome."[3]

If the distribution of each Xi is limited to the right (or left) then the asymptotic distribution of the range is equal to the asymptotic distribution of the largest (smallest) value. For more general distributions the asymptotic distribution can be expressed as a Bessel function.[3]

Moments[edit]

The mean range is given by[5]

n \int_0^1 x(G)[G^{n-1}-(1-G)^{n-1}] \text{d}G

where x(G) is the inverse function. In the case where each of the Xi has a standard normal distribution, the mean range is given by[6]

\int_{-\infty}^\infty (1-(1-\Phi(x))^n-\Phi(x)^n ) \text{d}x.

Independent nonidentically distributed continuous random variables[edit]

For n nonidentically distributed independent continuous random variables X1, X2, ..., Xn with cumulative distribution functions G1(x), G2(x), ..., Gn(x) and probability density functions g1(x), g2(x), ..., gn(x), the range has cumulative distribution function [4]

F(t) = \sum_{i=1}^n \int_{-\infty}^\infty g_i(x) \prod_{j=1, j \neq i}^n [G_j(x+t)-G_j(x)]\text{d}x.

Independent identically distributed discrete random variables[edit]

For n independent and identically distributed discrete random variables X1, X2, ..., Xn with cumulative distribution function G(x) and probability mass function g(x) the range of the Xi is the range of a sample of size n from a population with distribution function G(x). We can assume without loss of generality that the support of each Xi is {1,2,3,...,N} where N is a positive integer or infinity.[7][8]

Distribution[edit]

The range has probability mass function[7][9][10]

f(t)=\begin{cases}
\sum_{x=1}^N[g(x)]^n & t=0 \\
\sum_{x=1}^{N-t}\left(\begin{alignat}{2} &[G(x+t)-G(x-1)]^n\\
&-[G(x+t)-G(x)]^n\\
&-[G(x+t-1)-G(x-1)]^n\\
&+[G(x+t-1)-G(x)]^n \\
\end{alignat} \right)& t=1,2,3\ldots,N-1.\\
\end{cases}

Example[edit]

If we suppose that g(x)=1/N, the discrete uniform distribution for all x, then we find[9][11]

f(t)=\left\{\begin{array}{ll}
\frac{1}{N^{n-1}} & t=0 \\
\sum_{x=1}^{N-t}\left([\frac{t+1}{N}]^n -2[\frac{t}{N}]^n +[\frac{t-1}{N}]^n 
\right)& t=1,2,3\ldots ,N-1.
\end{array}\right.

Related quantities[edit]

The range is a simple function of the sample maximum and minimum and these are specific examples of order statistics. In particular, the range is a linear function of order statistics, which brings it into the scope of L-estimation.

See also[edit]

References[edit]

  1. ^ George Woodbury (2001). An Introduction to Statistics. Cengage Learning. p. 74. ISBN 0534377556. 
  2. ^ Carin Viljoen (2000). Elementary Statistics: Vol 2. Pearson South Africa. pp. 7–27. ISBN 186891075X. 
  3. ^ a b c E. J. Gumbel (1947). "The Distribution of the Range". The Annals of Mathematical Statistics 18 (3): 384–412. JSTOR 2235736.  edit
  4. ^ a b Tsimashenka, I.; Knottenbelt, W.; Harrison, P. (2012). "Controlling Variability in Split-Merge Systems". "Analytical and Stochastic Modeling Techniques and Applications". Lecture Notes in Computer Science 7314. p. 165. doi:10.1007/978-3-642-30782-9_12. ISBN 978-3-642-30781-2.  edit
  5. ^ H. O. Hartley; H. A. David (1954). "Universal Bounds for Mean Range and Extreme Observation". The Annals of Mathematical Statistics 25 (1): 85–99. JSTOR 2236514.  edit
  6. ^ L. H. C. Tippett (1925). "On the Extreme Individuals and the Range of Samples Taken from a Normal Population". Biometrika 17 (3/4): 364–387. JSTOR 2332087.  edit
  7. ^ a b Evans, D. L.; Leemis, L. M.; Drew, J. H. (2006). "The Distribution of Order Statistics for Discrete Random Variables with Applications to Bootstrapping". INFORMS Journal on Computing 18: 19. doi:10.1287/ijoc.1040.0105.  edit
  8. ^ Irving W. Burr (1955). "Calculation of Exact Sampling Distribution of Ranges from a Discrete Population". The Annals of Mathematical Statistics 26 (3): 530–532. JSTOR 2236482.  edit
  9. ^ a b Abdel-Aty, S. H. (1954). "Ordered variables in discontinuous distributions". Statistica Neerlandica 8 (2): 61–82. doi:10.1111/j.1467-9574.1954.tb00442.x.  edit
  10. ^ Siotani, M. (1956). "Order statistics for discrete case with a numerical application to the binomial distribution". Annals of the Institute of Statistical Mathematics 8: 95–96. doi:10.1007/BF02863574.  edit
  11. ^ Paul R. Rider (1951). "The Distribution of the Range in Samples from a Discrete Rectangular Population". Journal of the American Statistical Association 46 (255): 375–378. doi:10.1080/01621459.1951.10500796. JSTOR 2280515.  edit

External links[edit]

  • APPL, a Maple script for computing the range of independent identically discrete random variables