# Law of total expectation

The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations, the tower rule, the smoothing theorem, and Adam's Law among other names, states that if X is an integrable random variable (i.e., a random variable satisfying E( |X| ) < ∞) and Y is any random variable, not necessarily integrable, on the same probability space, then

${\displaystyle \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),}$

i.e., the expected value of the conditional expected value of X given Y is the same as the expected value of X.

The conditional expected value E( X | Y ) is a random variable in its own right, whose value depends on the value of Y. Notice that the conditional expected value of X given the event Y = y is a function of y. If we write E( X | Y = y) = g(y) then the random variable E( X | Y ) is just g(Y).

One special case states that if ${\displaystyle A_{1},A_{2},\ldots ,A_{n}}$ is a partition of the whole outcome space, i.e. these events are mutually exclusive and exhaustive, then

${\displaystyle \operatorname {E} (X)=\sum _{i=1}^{n}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.}$

## Example

Suppose that two factories supply light bulbs to the market. Factory X's bulbs work for an average of 5000 hours, whereas factory Y's bulbs work for an average of 4000 hours. It is known that factory X supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

${\displaystyle \operatorname {E} (L)=\operatorname {E} (L\mid X)\Pr(X)+\operatorname {E} (L\mid Y)\Pr(Y)=5000(.6)+4000(.4)=4600}$

where

• ${\displaystyle \operatorname {E} (L)}$ is the expected life of the bulb;
• ${\displaystyle \Pr(X)={6 \over 10}}$ is the probability that the purchased bulb was manufactured by factory X;
• ${\displaystyle \Pr(Y)={4 \over 10}}$ is the probability that the purchased bulb was manufactured by factory Y;
• ${\displaystyle \operatorname {E} (L\mid X)=5000}$ is the expected lifetime of a bulb manufactured by X;
• ${\displaystyle \operatorname {E} (L\mid Y)=4000}$ is the expected lifetime of a bulb manufactured by Y.

Thus each purchased light bulb has an expected lifetime of 4600 hours.

## Proof in the discrete case

{\displaystyle {\begin{aligned}\operatorname {E} _{Y}\left(\operatorname {E} _{X\mid Y}(X\mid Y)\right)&{}=\operatorname {E} _{Y}{\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y){\Bigg ]}\\[6pt]&{}=\sum _{y}{\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y=y){\Bigg ]}\cdot \operatorname {P} (Y=y)\\[6pt]&{}=\sum _{y}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y=y)\cdot \operatorname {P} (Y=y)\\[6pt]&{}=\sum _{x}x\sum _{y}\operatorname {P} (X=x\mid Y=y)\cdot \operatorname {P} (Y=y)\\[6pt]&{}=\sum _{x}x\sum _{y}\operatorname {P} (X=x,Y=y)\\[6pt]&{}=\sum _{x}x\cdot \operatorname {P} (X=x)\\[6pt]&{}=\operatorname {E} (X).\end{aligned}}}

## Proof in the general case

The general statement of the result makes reference to a probability space ${\displaystyle (\Omega ,{\mathcal {F}},P)}$ on which two sub σ-algebras ${\displaystyle {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}\subseteq {\mathcal {F}}}$ are defined. For a random variable ${\displaystyle X}$ on such a space, the smoothing law states that

${\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}].}$

Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

• ${\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]{\mbox{ is }}{\mathcal {G}}_{1}}$-measurable
• ${\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]dP=\int _{G_{1}}XdP{\mbox{ holds for all }}G_{1}\in {\mathcal {G}}_{1}}$

The first of these properties holds by the definition of the conditional expectation, and the second holds since ${\displaystyle G_{1}\in {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}}$ implies

${\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]dP=\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]dP=\int _{G_{1}}XdP.}$

In the special case that ${\displaystyle {\mathcal {G}}_{1}=\{\emptyset ,\Omega \}}$ and ${\displaystyle {\mathcal {G}}_{2}=\sigma (Y)}$, the smoothing law reduces to the statement

${\displaystyle \operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].}$

## Notation without indices

When using the expectation operator ${\displaystyle \operatorname {E} }$, adding indices to the operator may lead to cumbersome notations and these indices are often omitted. In the case of iterated expectations ${\displaystyle \operatorname {E} \left(\operatorname {E} (X\mid Y)\right)}$ stands for ${\displaystyle \operatorname {E} _{Y}\left(\operatorname {E} _{X\mid Y}(X\mid Y)\right)}$. The innermost expectation is the conditional expectation of ${\displaystyle X}$ given ${\displaystyle Y}$, and the outermost expectation is taken with respect to the conditioning variable ${\displaystyle Y}$.

## Iterated expectations with nested conditioning sets

The following formulation of the law of iterated expectations plays an important role in many economic and finance models:

${\displaystyle \operatorname {E} (X\mid I_{1})=\operatorname {E} (\operatorname {E} (X\mid I_{2})\mid I_{1}),}$

where the value of I2 is determined by that of I1. To build intuition, imagine an investor who forecasts a random stock price X based on the limited information set I1. The law of iterated expectations says that the investor can never gain a more precise forecast of X by conditioning on more specific information (I2), if the more specific forecast must itself be forecast with the original information (I1).

This formulation is often applied in a time series context, where Et denotes expectations conditional on only the information observed up to and including time period t. In typical models the information set t + 1 contains all information available through time t, plus additional information revealed at time t + 1. One can then write:[2]

${\displaystyle \operatorname {E} _{t}(X)=\operatorname {E} _{t}(\operatorname {E} _{t+1}(X)).}$