Chi-square distribution

From Wikipedia, the free encyclopedia

Jump to: navigation, search
chi-square
Probability density function
Cumulative distribution function
Parameters k > 0\, degrees of freedom
Support x \in [0; +\infty)\,
Probability density function (pdf) \frac{(1/2)^{k/2}}{\Gamma(k/2)} x^{k/2 - 1} e^{-x/2}\,
Cumulative distribution function (cdf) \frac{\gamma(k/2,x/2)}{\Gamma(k/2)}\,
Mean k\,
Median approximately k-2/3\,
Mode k-2\, if k\geq 2\,
Variance 2\,k\,
Skewness \sqrt{8/k}\,
Excess kurtosis 12/k\,
Entropy \frac{k}{2}\!+\!\ln(2\Gamma(k/2))\!+\!(1\!-\!k/2)\psi(k/2)
Moment-generating function (mgf) (1-2\,t)^{-k/2} for 2\,t<1\,
Characteristic function (1-2\,i\,t)^{-k/2}\,

In probability theory and statistics, the chi-square distribution (also chi-squared or χ2  distribution) is one of the most widely used theoretical probability distributions in inferential statistics, e.g., in statistical significance tests.[1][2][3][4] It is useful because, under reasonable assumptions, easily calculated quantities can be proven to have distributions that approximate to the chi-square distribution if the null hypothesis is true.

The best-known situations in which the chi-square distribution are used are the common chi-square tests for goodness of fit of an observed distribution to a theoretical one, and of the independence of two criteria of classification of qualitative data. Many other statistical tests also lead to a use of this distribution, like Friedman's analysis of variance by ranks.

Contents

[edit] Definition

If Xi are k independent, normally distributed random variables with mean 0 and variance 1, then the random variable

Q = \sum_{i=1}^k X_i^2

is distributed according to the chi-square distribution with k degrees of freedom. This is usually written

Q\sim\chi^2_k.\,

The chi-square distribution has one parameter: k - a positive integer that specifies the number of degrees of freedom (i.e. the number of Xi)

The chi-square distribution is a special case of the gamma distribution.

[edit] Characteristics

[edit] Probability density function

A probability density function of the chi-square distribution is


f(x;k)=
\begin{cases}\displaystyle
\frac{1}{2^{k/2}\Gamma(k/2)}\,x^{(k/2) - 1} e^{-x/2}&\text{for }x>0,\\
0&\text{for }x\le0,
\end{cases}

where Γ denotes the Gamma function, which has closed-form values at the half-integers.

[edit] Cumulative distribution function

Its cumulative distribution function is:

F(x;k)=\frac{\gamma(k/2,x/2)}{\Gamma(k/2)} = P(k/2, x/2)

where γ(k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function.

Tables of this distribution — usually in its cumulative form — are widely available and the function is included in many spreadsheets and all statistical packages.

[edit] Characteristic function

The characteristic function of the Chi-square distribution is

\chi(t;k)=(1-2it)^{-k/2}.\,

[edit] Expected value and variance

If X\sim\chi^2_k then

E(X) = k
Var(X) = 2k

[edit] Median

The median of X\sim\chi^2_k is given approximately by

k-\frac{2}{3}+\frac{4}{27k}-\frac{8}{729k^2}.

[edit] Information entropy

The information entropy is given by


H
=
\int_{-\infty}^\infty f(x;k)\ln(f(x;k)) dx
=
\frac{k}{2}
+
\ln
 \left(
  2 \Gamma
  \left(
   \frac{k}{2}
  \right)
 \right)
+
\left(1 - \frac{k}{2}\right)
\psi(k/2).

where ψ(x) is the Digamma function.

[edit] Related distributions and properties

The chi-square distribution has numerous applications in inferential statistics, for instance in chi-square tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student's t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables divided by their respective degrees of freedom.

  • If X\sim\chi^2_k, then as k tends to infinity, the distribution of X tends to a normal distribution with mean k and variance 2k (convergence is slow as the skewness is \sqrt{8/k} and the excess kurtosis is 12 / k)
  • If X\sim\chi^2_k then \sqrt{2X} is approximately normally distributed with mean \sqrt{2k-1} and unit variance (result credited to R. A. Fisher).
  • If X\sim\chi^2_k then \sqrt[3]{X/k} is approximately normally distributed with mean 1 − 2 / (9k) and variance 2 / (9k) (Wilson and Hilferty,1931)
  • X \sim \mathrm{Exponential}(\lambda = \tfrac{1}{2}) is an exponential distribution if X \sim \chi_2^2 (with 2 degrees of freedom).
  • Y \sim \chi_{\nu}^2 is a chi-square distribution if Y = \sum_{m=1}^{\nu} X_m^2 for X_i \sim N(0,1) independent that are normally distributed.
  • If \boldsymbol{z}'=[Z_1,Z_2,\cdots,Z_n], where the Zis are independent Normal(0,σ2) random variables or \boldsymbol{z}\sim N_p(\boldsymbol{0},\sigma^2 \mathrm{I}) and \boldsymbol{A} is an n\times n idempotent matrix with rank nk then the quadratic form \frac{\boldsymbol{z}'\boldsymbol{A}\boldsymbol{z}}{\sigma^2}\sim \chi^2_{n-k}.
  • If the X_i\sim N(\mu_i,1) have nonzero means, then Y = \sum_{m=1}^k X_m^2 is drawn from a noncentral chi-square distribution.
  • The chi-square distribution X\sim\chi^2_\nu is a special case of the gamma distribution, in that X \sim {\Gamma}(\frac{\nu}{2}, \theta=2).
  • Y \sim \mathrm{F}(\nu_1, \nu_2) is an F-distribution if Y = \frac{X_1 / \nu_1}{X_2 / \nu_2} where X_1 \sim \chi_{\nu_1}^2 and X_2 \sim \chi_{\nu_2}^2 are independent with their respective degrees of freedom.
  • Y \sim \chi^2(\bar{\nu}) is a chi-square distribution if Y = \sum_{m=1}^N X_m where X_m \sim \chi^2(\nu_m) are independent and \bar{\nu} = \sum_{m=1}^N \nu_m.
  • if X is chi-square distributed, then \sqrt{X} is chi distributed.
  • in particular, if X \sim \chi_2^2 (chi-square with 2 degrees of freedom), then \sqrt{X} is Rayleigh distributed.
  • if X_1, \dots, X_n are i.i.d. N(μ,σ2) random variables, then \sum_{i=1}^n(X_i - \bar X)^2 \sim \sigma^2 \chi^2_{n-1} where \bar X = \frac{1}{n} \sum_{i=1}^n X_i.
  • if X \sim \mathrm{SkewLogistic}(\tfrac{1}{2})\,, then \mathrm{log}(1 + e^{-X}) \sim  \chi_2^2\,
  • The box below shows probability distributions with name starting with chi for some statistics based on X_i\sim \mathrm{Normal}(\mu_i,\sigma^2_i),i=1,\cdots,k, independent random variables:
Name Statistic
chi-square distribution \sum_{i=1}^k \frac{\left(X_i-\mu_i\right)^2}{\sigma_i^2}
noncentral chi-square distribution \sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2
chi distribution \sqrt{\sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2}
noncentral chi distribution \sqrt{\sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2}

[edit] See also

[edit] External links

[edit] References

  1. ^ Abramowitz, Milton & Stegun, Irene A., eds. (1965), "Chapter 26", Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, ISBN 0-486-61272-4 .
  2. ^ NIST (2006). Engineering Statistics Handbook - Chi-Square Distribution
  3. ^ Jonhson, N.L.; S. Kotz, , N. Balakrishnan (1994). Continuous Univariate Distributions (Second Ed., Vol. 1, Chapter 18). John Willey and Sons. ISBN 0-471-58495-9. 
  4. ^ Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 241-246). McGraw-Hill. ISBN 0-07-042864-6. 



Personal tools