Law of total variance
In probability theory, the law of total variance[1] or variance decomposition formula or conditional variance formulas or law of iterated variances also known as Eve's law,[2] states that if and are random variables on the same probability space, and the variance of is finite, then
In language perhaps better known to statisticians than to probability theorists, the two terms are the "unexplained" and the "explained" components of the variance respectively (cf. fraction of variance unexplained, explained variation). In actuarial science, specifically credibility theory, the first component is called the expected value of the process variance (EVPV) and the second is called the variance of the hypothetical means (VHM).[3] These two components are also the source of the term "Eve's law", from the initials EV VE for "expectation of variance" and "variance of expectation".
Example
Suppose X is a coin flip with the probability of heads being h. Suppose that when X = heads then Y is drawn from a normal distribution with mean μh and standard deviation σh, and that when X = tails then Y is drawn from normal distribution with mean μt and standard deviation σt. Then the first, "unexplained" term on the right-hand side of the above formula is the weighted average of the variances, hσh2 + (1 − h)σt2, and the second, "explained" term is the variance of the distribution that gives μh with probability h and gives μt with probability 1 − h.
Formulation
There is a general variance decomposition formula for components (see below).[4] For example, with two conditioning random variables: which follows from the law of total conditional variance:[4]
Note that the conditional expected value is a random variable in its own right, whose value depends on the value of Notice that the conditional expected value of given the event is a function of (this is where adherence to the conventional and rigidly case-sensitive notation of probability theory becomes important!). If we write then the random variable is just Similar comments apply to the conditional variance.
One special case, (similar to the law of total expectation) states that if is a partition of the whole outcome space, that is, these events are mutually exclusive and exhaustive, then
In this formula, the first component is the expectation of the conditional variance; the other two components are the variance of the conditional expectation.
Proof
The law of total variance can be proved using the law of total expectation.[5] First, from the definition of variance. Again, from the definition of variance, and applying the law of total expectation, we have
Now we rewrite the conditional second moment of in terms of its variance and first moment, and apply the law of total expectation on the right hand side:
Since the expectation of a sum is the sum of expectations, the terms can now be regrouped:
Finally, we recognize the terms in the second set of parentheses as the variance of the conditional expectation :
General variance decomposition applicable to dynamic systems
The following formula shows how to apply the general, measure theoretic variance decomposition formula [4] to stochastic dynamic systems. Let be the value of a system variable at time Suppose we have the internal histories (natural filtrations) , each one corresponding to the history (trajectory) of a different collection of system variables. The collections need not be disjoint. The variance of can be decomposed, for all times into components as follows:
The decomposition is not unique. It depends on the order of the conditioning in the sequential decomposition.
The square of the correlation and explained (or informational) variation
In cases where are such that the conditional expected value is linear; that is, in cases where it follows from the bilinearity of covariance that and and the explained component of the variance divided by the total variance is just the square of the correlation between and that is, in such cases,
One example of this situation is when have a bivariate normal (Gaussian) distribution.
More generally, when the conditional expectation is a non-linear function of [4] which can be estimated as the squared from a non-linear regression of on using data drawn from the joint distribution of When has a Gaussian distribution (and is an invertible function of ), or itself has a (marginal) Gaussian distribution, this explained component of variation sets a lower bound on the mutual information:[4]
Higher moments
A similar law for the third central moment says
For higher cumulants, a generalization exists. See law of total cumulance.
See also
- Law of total covariance – Formula in probability theory − a generalization
- Law of propagation of errors – Effect of variables' uncertainties on the uncertainty of a function based on them
References
- ^ Neil A. Weiss, A Course in Probability, Addison–Wesley, 2005, pages 385–386.
- ^ Joseph K. Blitzstein and Jessica Hwang: "Introduction to Probability"
- ^ Mahler, Howard C.; Dean, Curtis Gary (2001). "Chapter 8: Credibility" (PDF). In Casualty Actuarial Society (ed.). Foundations of Casualty Actuarial Science (4th ed.). Casualty Actuarial Society. pp. 525–526. ISBN 978-0-96247-622-8. Retrieved June 25, 2015.
- ^ a b c d e Bowsher, C.G. and P.S. Swain, Identifying sources of variation and the flow of information in biochemical networks, PNAS May 15, 2012 109 (20) E1320-E1328.
- ^ Neil A. Weiss, A Course in Probability, Addison–Wesley, 2005, pages 380–383.
- Blitzstein, Joe. "Stat 110 Final Review (Eve's Law)" (PDF). stat110.net. Harvard University, Department of Statistics. Retrieved 9 July 2014.
- Billingsley, Patrick (1995). Probability and Measure. New York, NY: John Wiley & Sons, Inc. ISBN 0-471-00710-2. (Problem 34.10(b))