Chi-square distribution

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Probability density function
Chi-square distributionPDF.png
Cumulative distribution function
Chi-square distributionCDF.png
notation: \chi^2(k)\,
parameters: kN1 — degrees of freedom
support: x ∈ [0, +∞)
pdf: \frac{1}{2^{k/2}\Gamma(k/2)}\; x^{k/2-1} e^{-x/2}\,
cdf: \frac{1}{\Gamma(k/2)}\;\gamma(k/2,\,x/2)
mean: k
median: \approx k\bigg(1-\frac{2}{9k}\bigg)^3
mode: max{ k − 2, 0 }
variance: 2k
skewness: \scriptstyle\sqrt{8/k}\,
kurtosis: 12 / k
entropy: \frac{k}{2}\!+\!\ln(2\Gamma(k/2))\!+\!(1\!-\!k/2)\psi(k/2)
mgf: (1-2\,t)^{-k/2}   for | t | ≤ ½
cf: (1-2\,i\,t)^{-k/2}       [1]

In probability theory, the chi-square distribution (also chi-squared or χ²-distribution) with k degrees of freedom is the distribution of a sum of squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics, e.g. in hypothesis testing or in construction of confidence intervals.[2][3][4][5]

The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness of fit of an observed distribution to a theoretical one, and of the independence of two criteria of classification of qualitative data. Many other statistical tests also lead to a use of this distribution, like Friedman's analysis of variance by ranks.

Contents

[edit] Definition

If X1, …, Xk are independent, normally distributed random variables with mean 0 and variance 1, then the random variable


    Q = \sum_{i=1}^k X_i^2

is distributed according to the chi-square distribution with k degrees of freedom. This is usually written


    Q\ \sim\ \chi^2(k). \,

The chi-square distribution has one parameter: k — a positive integer that specifies the number of degrees of freedom (i.e. the number of Xis)

[edit] Characteristics

Further properties of the chi-square distribution can be found in the box at right.

[edit] Probability density function

The probability density function (pdf) of the chi-square distribution is


    f(x;\,k) = \frac{1}{2^{k/2}\Gamma(k/2)}\,x^{k/2 - 1} e^{-x/2}\, \mathbf{1}_{\{x\geq0\}},

where Γ(k/2) denotes the Gamma function, which has closed-form values at the half-integers.

For derivations of the pdf in the cases of one and two degrees of freedom, see Proofs related to chi-square distribution.

[edit] Cumulative distribution function

Its cumulative distribution function is:


    F(x;\,k) = \frac{\gamma(k/2,\,x/2)}{\Gamma(k/2)} = P(k/2,\,x/2),

where γ(k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function.

Tables of this distribution — usually in its cumulative form — are widely available and the function is included in many spreadsheets and all statistical packages.

[edit] Additivity

It follows from the definition of the chi-square distribution that the sum of independent chi-square variables is also chi-square distributed. Specifically, if \{X_i\}_{i=1}^n are independent chi-square variables with \{k_i\}_{i=1}^n degrees of freedom, respectively, then Y = X_1 + \cdots + X_n is chi-square distributed with k_1 + \cdots + k_n degrees of freedom.

[edit] Information entropy

The information entropy is given by


    H = \int_{-\infty}^\infty f(x;\,k)\ln f(x;\,k) \, dx
      = \frac{k}{2} + \ln\big( 2\Gamma(k/2) \big) + \big(1 - k/2\big) \psi(k/2),

where ψ(x) is the Digamma function.

[edit] Noncentral moments

The moments about zero of a chi-square distribution with k degrees of freedom are given by[6][7]


    \operatorname{E}(X^m) = k (k+2) (k+4) \cdots (k+2m-2) = 2^m \frac{\Gamma(m+k/2)}{\Gamma(k/2)}.

[edit] Cumulants

The cumulants are readily obtained by a (formal) power series expansion of the logarithm of the characteristic function:


    \kappa_n = 2^{n-1}(n-1)!\,k

[edit] Asymptotic properties

By the central limit theorem, because the chi-square distribution is the sum of k independent random variables, it converges to a normal distribution for large k (k > 50 is “approximately normal” according to [8]). Specifically, if X ~ χ²(k), then as k tends to infinity, the distribution of (X-k)/\sqrt{2k} tends to a standard normal distribution. However, convergence is slow as the skewness is \sqrt{8/k} and the excess kurtosis is 12 / k.

Other functions of the chi-square distribution converge more rapidly to a normal distribution. Some examples are:

  • If X ~ χ²(k) then \scriptstyle\sqrt{2X} is approximately normally distributed with mean \scriptstyle\sqrt{2k-1} and unit variance (result credited to R. A. Fisher).
  • If X ~ χ²(k) then \scriptstyle\sqrt[3]{X/k} is approximately normally distributed with mean \scriptstyle 1-2/(9k) and variance \scriptstyle 2/(9k) (Wilson and Hilferty, 1931)

[edit] Related distributions

A chi-square variable with k degrees of freedom is defined as the sum of the squares of k independent standard normal random variables.

More generally, the chi-square distribution is related to any Gaussian random vector of length k as follows. If Y is a Gaussian random vector having mean vector μ and covariance matrix C, then X = (Yμ)′C−1(Yμ) is chi-square distributed with k degrees of freedom. This is because the subtraction of μ and the multiplication by C−1/2 effectively transforms the Gaussian vector to an i.i.d., zero-mean distribution.

The sum of squares of statistically independent unit-variance Gaussian variables which do not have mean zero yields a generalization of the chi-square distribution called the noncentral chi-square distribution.

If Y is a vector of k i.i.d. standard normal random variables and A is a k×k idempotent matrix with rank k−n then the quadratic form Y′AY is chi-square distributed with k−n degrees of freedom.

The chi-square distribution is also naturally related to other distributions arising from the Gaussian. In particular,

  • Y is F-distributed, Y ∼ F(k1,k2) if \scriptstyle Y = \frac{X_1 / k_1}{X_2 / k_2} where X1 ~ χ²(k1) and X2 ~ χ²(k2) are statistically independent.

[edit] Generalizations

The chi-square distribution is obtained from the sum of k independent, zero-mean, unit-variance Gaussian random variables. Generalizations of this distribution can be obtained by summing the squares of other types of Gaussian random variables. Several such distributions are described below.

[edit] Noncentral chi-square distribution

The noncentral chi-square distribution is obtained from the sum of the squares of independent Gaussian random variables having unit variance and nonzero means.

[edit] Generalized chi-square distribution

The generalized chi-square distribution is obtained from the quadratic form z′Az where z is a zero-mean Gaussian vector having an arbitrary covariance matrix, and A is an arbitrary matrix.

[edit] Gamma, exponential, and related distributions

The chi-square distribution X ~ χ²(k) is a special case of the gamma distribution, in that X ~ Γ(k/2, 2).

Because the exponential distribution is also a special case of the Gamma distribution, we also have that if X ~ χ²(2), then X ~ Exp(1/2) is an exponential distribution.

The Erlang distribution is also a special case of the Gamma distribution and thus we also have that if X ~ χ²(k) with even k, then X is Erlang distributed with shape parameter k/2 and scale parameter 1/2.

[edit] Applications

The chi-square distribution has numerous applications in inferential statistics, for instance in chi-square tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student’s t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables divided by their respective degrees of freedom.

Following are some of the most common situations in which the chi-square distribution arises from a Gaussian-distributed sample.

  • The box below shows probability distributions with name starting with chi for some statistics based on X_i\sim \mathrm{Normal}(\mu_i,\sigma^2_i),i=1,\cdots,k, independent random variables:
Name Statistic
chi-square distribution \sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2
noncentral chi-square distribution \sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2
chi distribution \sqrt{\sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2}
noncentral chi distribution \sqrt{\sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2}

[edit] See also

[edit] References

  1. ^ M.A. Sanders. "Characteristic function of the central chi-square distribution". http://www.planetmathematics.com/CentralChiDistr.pdf. Retrieved 2009-03-06. 
  2. ^ Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26", Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, ISBN 0-486-61272-4, http://www.math.sfu.ca/~cbm/aands/page_940.htm .
  3. ^ NIST (2006). Engineering Statistics Handbook - Chi-Square Distribution
  4. ^ Jonhson, N.L.; S. Kotz, , N. Balakrishnan (1994). Continuous Univariate Distributions (Second Ed., Vol. 1, Chapter 18). John Willey and Sons. ISBN 0-471-58495-9. 
  5. ^ Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 241-246). McGraw-Hill. ISBN 0-07-042864-6. 
  6. ^ Chi-square distribution, from MathWorld, retrieved Feb. 11, 2009
  7. ^ M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN 978-0-387-34657-1
  8. ^ Box, Hunter and Hunter. Statistics for experimenters. Wiley. p. 46. 
  • Wilson, E.B. Hilferty, M.M. (1931) The distribution of chi-square. Procedings of the National Academy of Sciences, Washington, 17, 684–688.

[edit] External links