# Law of total expectation

The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), Adam's law,[3] the tower rule,[4] and the smoothing theorem,[5] among other names, states that if ${\displaystyle X}$ is a random variable whose expected value ${\displaystyle \operatorname {E} (X)}$ is defined, and ${\displaystyle Y}$ is any random variable on the same probability space, then

${\displaystyle \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),}$

i.e., the expected value of the conditional expected value of ${\displaystyle X}$ given ${\displaystyle Y}$ is the same as the expected value of ${\displaystyle X}$.

Note: The conditional expected value E(X | Y), with Y a random variable, is not a simple number; it is a random variable whose value depends on the value of Y. That is, the conditional expected value of X given the event Y = y is a number and it is a function of y. If we write g(y) for the value of E(X | Y = y) then the random variable E(X | Y) is g(Y).

One special case states that if ${\displaystyle {\left\{A_{i}\right\}}}$ is a finite or countable partition of the sample space, then

${\displaystyle \operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.}$

## Example

Suppose that only two factories supply light bulbs to the market. Factory ${\displaystyle X}$'s bulbs work for an average of 5000 hours, whereas factory ${\displaystyle Y}$'s bulbs work for an average of 4000 hours. It is known that factory ${\displaystyle X}$ supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

{\displaystyle {\begin{aligned}\operatorname {E} (L)&=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)\\[3pt]&=5000(0.6)+4000(0.4)\\[2pt]&=4600\end{aligned}}}

where

• ${\displaystyle \operatorname {E} (L)}$ is the expected life of the bulb;
• ${\displaystyle \operatorname {P} (X)={6 \over 10}}$ is the probability that the purchased bulb was manufactured by factory ${\displaystyle X}$;
• ${\displaystyle \operatorname {P} (Y)={4 \over 10}}$ is the probability that the purchased bulb was manufactured by factory ${\displaystyle Y}$;
• ${\displaystyle \operatorname {E} (L\mid X)=5000}$ is the expected lifetime of a bulb manufactured by ${\displaystyle X}$;
• ${\displaystyle \operatorname {E} (L\mid Y)=4000}$ is the expected lifetime of a bulb manufactured by ${\displaystyle Y}$.

Thus each purchased light bulb has an expected lifetime of 4600 hours.

## Informal proof

When a joint probability density function is well defined and the expectations are integrable, we write for the general case

{\displaystyle {\begin{aligned}\operatorname {E} (X)&=\int x\Pr[X=x]~dx\\\operatorname {E} (X\mid Y=y)&=\int x\Pr[X=x\mid Y=y]~dx\\\operatorname {E} (\operatorname {E} (X\mid Y))&=\int \left(\int x\Pr[X=x\mid Y=y]~dx\right)\Pr[Y=y]~dy\\&=\int \int x\Pr[X=x,Y=y]~dx~dy\\&=\int x\left(\int \Pr[X=x,Y=y]~dy\right)~dx\\&=\int x\Pr[X=x]~dx\\&=\operatorname {E} (X)\,.\end{aligned}}}
A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.

## Proof in the general case

Let ${\displaystyle (\Omega ,{\mathcal {F}},\operatorname {P} )}$ be a probability space on which two sub σ-algebras ${\displaystyle {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}\subseteq {\mathcal {F}}}$ are defined. For a random variable ${\displaystyle X}$ on such a space, the smoothing law states that if ${\displaystyle \operatorname {E} [X]}$ is defined, i.e. ${\displaystyle \min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty }$, then

${\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}]\quad {\text{(a.s.)}}.}$

Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

• ${\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]{\mbox{ is }}{\mathcal {G}}_{1}}$-measurable
• ${\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} ,}$ for all ${\displaystyle G_{1}\in {\mathcal {G}}_{1}.}$

The first of these properties holds by definition of the conditional expectation. To prove the second one,

{\displaystyle {\begin{aligned}\min \left(\int _{G_{1}}X_{+}\,d\operatorname {P} ,\int _{G_{1}}X_{-}\,d\operatorname {P} \right)&\leq \min \left(\int _{\Omega }X_{+}\,d\operatorname {P} ,\int _{\Omega }X_{-}\,d\operatorname {P} \right)\\[4pt]&=\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty ,\end{aligned}}}

so the integral ${\displaystyle \textstyle \int _{G_{1}}X\,d\operatorname {P} }$ is defined (not equal ${\displaystyle \infty -\infty }$).

The second property thus holds since ${\displaystyle G_{1}\in {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}}$ implies

${\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} .}$

Corollary. In the special case when ${\displaystyle {\mathcal {G}}_{1}=\{\emptyset ,\Omega \}}$ and ${\displaystyle {\mathcal {G}}_{2}=\sigma (Y)}$, the smoothing law reduces to

${\displaystyle \operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].}$

Alternative proof for ${\displaystyle \operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].}$

This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition, ${\displaystyle \operatorname {E} [X\mid Y]:=\operatorname {E} [X\mid \sigma (Y)]}$ is a ${\displaystyle \sigma (Y)}$-measurable random variable that satisfies

${\displaystyle \int _{A}\operatorname {E} [X\mid Y]\,d\operatorname {P} =\int _{A}X\,d\operatorname {P} ,}$

for every measurable set ${\displaystyle A\in \sigma (Y)}$. Taking ${\displaystyle A=\Omega }$ proves the claim.