Computational formula for the variance

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In probability theory and statistics, the computational formula for the variance Var(X) of a random variable X is the formula

\operatorname{Var}(X) = \operatorname{E}(X^2) - (\operatorname{E}(X))^2\,

where E(X) is the expected value of X. This formula can be generalized for covariance:

\operatorname{Cov}(X_i, X_j) = \operatorname{E}(X_iX_j) -\operatorname{E}(X_i)\operatorname{E}(X_j)

as well as for the n by n covariance matrix of a random vector of length n:

 \operatorname{Var}(\mathbf{X}) = \operatorname{E}(\mathbf{X X^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{X})^\top

and for the n by m cross-covariance matrix between two random vectors of length n and m:


\operatorname{Cov}(\textbf{X},\textbf{Y})=
\operatorname{E}(\mathbf{X Y^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{Y})^\top

where expectations are taken element-wise and \mathbf{X}=\{X_1,X_2,\ldots,X_n\} and \mathbf{Y}=\{Y_1,Y_2,\ldots,Y_m\} are random vectors of respective lengths n and m.

A closely related identity can be used to calculate the sample variance, which is often used as an unbiased estimate of the population variance:


\hat{\sigma}^2 = \frac{1}{N-1}\sum_{i=1}^N(X_i-\bar{X})^2

These results are often used in practice to calculate the variance when it is inconvenient to center a random variable by subtracting its expected value or to center a set of data by subtracting the sample mean. However in some cases it is an easier calculation to carry out the centering first and then directly apply the definition of the variance.

[edit] Proof

The computational formula for the population variance follows in a straightforward manner from the linearity of expected values and the definition of variance:


\begin{array}{ccl}
\operatorname{Var}(X)&=&\operatorname{E}\left[X - \operatorname{E}(X)\right]^2\\
                     &=&\operatorname{E}\left[X^2 - 2X\operatorname{E}(X) + \operatorname{E}(X)^2\right]\\
                     &=&\operatorname{E}(X^2) - 2\operatorname{E}(X)\operatorname{E}(X) + \operatorname{E}(X)^2\\
                     &=&\operatorname{E}(X^2) - \operatorname{E}(X)^2.
\end{array}

To prove the result for the sample variance


\hat{\sigma}^2 = \frac{1}{N-1}\sum_{i=1}^N(X_i-\bar{X})^2,

note that the sample variance can be expressed as


\hat{\sigma}^2 = \frac{N}{N-1}\operatorname{Var}(X^*)

where X * is sampled uniformly with replacement from the observed data X1, ..., Xn and the variance on the right side is a population variance. Therefore the computational formula for the sample variance follows directly from the computational formula for the population variance.

[edit] Applications

Its applications in systolic geometry include Loewner's torus inequality.

[edit] See also