Computational formula for the variance

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In probability theory and statistics, the computational formula for the variance Var(X) of a random variable X is the formula

\operatorname{Var}(X) = \operatorname{E}(X^2) - [\operatorname{E}(X)]^2\,

where E(X) is the expected value of X. The result is called the König-Huygens theorem in French language literature.

A closely related identity can be used to calculate the sample variance, which is often used as an unbiased estimate of the population variance:


\hat{\sigma}^2 := \frac{1}{N-1}\sum_{i=1}^N(x_i-\bar{x})^2 = \frac{N}{N-1}\left(\frac{1}{N}\left(\sum_{i=1}^N x_i^2\right) - \bar{x}^2\right)

The second result is sometimes, unwisely, used in practice to calculate the variance. The problem is that subtracting two values having a similar value can lead to catastrophic cancellation.[1]

Contents

[edit] Proof

The computational formula for the population variance follows in a straightforward manner from the linearity of expected values and the definition of variance:


\begin{array}{ccl}
\operatorname{Var}(X)&=&\operatorname{E}\left[(X - \operatorname{E}(X))^2\right]\\
                     &=&\operatorname{E}\left[X^2 - 2X\operatorname{E}(X) + [\operatorname{E}(X)]^2\right]\\
                     &=&\operatorname{E}(X^2) - \operatorname{E}[2X\operatorname{E}(X)] + [\operatorname{E}(X)]^2\\
                     &=&\operatorname{E}(X^2) - 2\operatorname{E}(X)\operatorname{E}(X) + [\operatorname{E}(X)]^2\\
                     &=&\operatorname{E}(X^2) - 2[\operatorname{E}(X)]^2 + [\operatorname{E}(X)]^2\\
                     &=&\operatorname{E}(X^2) - [\operatorname{E}(X)]^2
\end{array}

[edit] Generalization to covariance

This formula can be generalized for covariance, with two random variables Xi and Xj:

\operatorname{Cov}(X_i, X_j) = \operatorname{E}(X_iX_j) -\operatorname{E}(X_i)\operatorname{E}(X_j)

as well as for the n by n covariance matrix of a random vector of length n:

 \operatorname{Var}(\mathbf{X}) = \operatorname{E}(\mathbf{X X^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{X})^\top

and for the n by m cross-covariance matrix between two random vectors of length n and m:


\operatorname{Cov}(\textbf{X},\textbf{Y})=
\operatorname{E}(\mathbf{X Y^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{Y})^\top

where expectations are taken element-wise and \mathbf{X}=\{X_1,X_2,\ldots,X_n\} and \mathbf{Y}=\{Y_1,Y_2,\ldots,Y_m\} are random vectors of respective lengths n and m.

[edit] Applications

Its applications in systolic geometry include Loewner's torus inequality.

[edit] See also

  1. ^ Donald E. Knuth (1998). The Art of Computer Programming, volume 2: Seminumerical Algorithms, 3rd edn., p. 232. Boston: Addison-Wesley.
Personal tools
Namespaces

Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages