# Law of total expectation

Jump to navigation Jump to search

The proposition in probability theory known as the law of total expectation, the law of iterated expectations, the tower rule, Adam's law, and the smoothing theorem, among other names, states that if $X$ is a random variable whose expected value $\operatorname {E} (X)$ is defined, and $Y$ is any random variable on the same probability space, then

$\operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),$ i.e., the expected value of the conditional expected value of $X$ given $Y$ is the same as the expected value of $X$ .

One special case states that if ${\left\{A_{i}\right\}}_{i}$ is a finite or countable partition of the sample space, then

$\operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.$ ## Example

Suppose that two factories supply light bulbs to the market. Factory $X$ 's bulbs work for an average of 5000 hours, whereas factory $Y$ 's bulbs work for an average of 4000 hours. It is known that factory $X$ supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

$\operatorname {E} (L)=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)=5000(0.6)+4000(0.4)=4600$ where

• $\operatorname {E} (L)$ is the expected life of the bulb;
• $\operatorname {P} (X)={6 \over 10}$ is the probability that the purchased bulb was manufactured by factory $X$ ;
• $\operatorname {P} (Y)={4 \over 10}$ is the probability that the purchased bulb was manufactured by factory $Y$ ;
• $\operatorname {E} (L\mid X)=5000$ is the expected lifetime of a bulb manufactured by $X$ ;
• $\operatorname {E} (L\mid Y)=4000$ is the expected lifetime of a bulb manufactured by $Y$ .

Thus each purchased light bulb has an expected lifetime of 4600 hours.

## Proof in the finite and countable cases

Let the random variables $X$ and $Y$ , defined on the same probability space, assume a finite or countably infinite set of finite values. Assume that $\operatorname {E} [X]$ is defined, i.e. $\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty$ . If $\{A_{i}\}$ is a partition of the probability space $\Omega$ , then

$\operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.$ Proof.

{\begin{aligned}\operatorname {E} \left(\operatorname {E} (X\mid Y)\right)&=\operatorname {E} {\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y){\Bigg ]}\\[6pt]&=\sum _{y}{\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y=y){\Bigg ]}\cdot \operatorname {P} (Y=y)\\[6pt]&=\sum _{y}\sum _{x}x\cdot \operatorname {P} (X=x,Y=y).\end{aligned}} If the series is finite, then we can switch the summations around, and the previous expression will become

{\begin{aligned}\sum _{x}\sum _{y}x\cdot \operatorname {P} (X=x,Y=y)&=\sum _{x}x\sum _{y}\operatorname {P} (X=x,Y=y)\\[6pt]&=\sum _{x}x\cdot \operatorname {P} (X=x)\\[6pt]&=\operatorname {E} (X).\end{aligned}} If, on the other hand, the series is infinite, then its convergence cannot be conditional, due to the assumption that $\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty .$ The series converges absolutely if both $\operatorname {E} [X_{+}]$ and $\operatorname {E} [X_{-}]$ are finite, and diverges to an infinity when either $\operatorname {E} [X_{+}]$ or $\operatorname {E} [X_{-}]$ is infinite. In both scenarios, the above summations may be exchanged without affecting the sum.

## Proof in the general case

Let $(\Omega ,{\mathcal {F}},\operatorname {P} )$ be a probability space on which two sub σ-algebras ${\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}\subseteq {\mathcal {F}}$ are defined. For a random variable $X$ on such a space, the smoothing law states that if $\operatorname {E} [X]$ is defined, i.e. $\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty$ , then

$\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}]\quad {\text{(a.s.)}}.$ Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

• $\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]{\mbox{ is }}{\mathcal {G}}_{1}$ -measurable
• $\int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]d\operatorname {P} =\int _{G_{1}}Xd\operatorname {P} ,$ for all $G_{1}\in {\mathcal {G}}_{1}.$ The first of these properties holds by definition of the conditional expectation. To prove the second one, note that

{\begin{aligned}\min \left(\int _{G_{1}}X_{+}\,d\operatorname {P} ,\int _{G_{1}}X_{-}\,d\operatorname {P} \right)&\leq \min \left(\int _{\Omega }X_{+}\,d\operatorname {P} ,\int _{\Omega }X_{-}\,d\operatorname {P} \right)\\[4pt]&=\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty ,\end{aligned}} so the integral $\textstyle \int _{G_{1}}X\,d\operatorname {P}$ is defined (not equal $\infty -\infty$ ).

The second property thus holds since $G_{1}\in {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}$ implies

$\int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]d\operatorname {P} =\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]d\operatorname {P} =\int _{G_{1}}Xd\operatorname {P} .$ Corollary. In the special case when ${\mathcal {G}}_{1}=\{\emptyset ,\Omega \}$ and ${\mathcal {G}}_{2}=\sigma (Y)$ , the smoothing law reduces to

$\operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].$ ## Proof of partition formula

{\begin{aligned}\sum \limits _{i}\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )\operatorname {P} (d\omega \mid A_{i})\cdot \operatorname {P} (A_{i})\\&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )\operatorname {P} (d\omega \cap A_{i})\\&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )I_{A_{i}}(\omega )\operatorname {P} (d\omega )\\&=\sum \limits _{i}\operatorname {E} (XI_{A_{i}}),\end{aligned}} where $I_{A_{i}}$ is the indicator function of the set $A_{i}$ .

If the partition ${\{A_{i}\}}_{i=0}^{n}$ is finite, then, by linearity, the previous expression becomes

$\operatorname {E} \left(\sum \limits _{i=0}^{n}XI_{A_{i}}\right)=\operatorname {E} (X),$ and we are done.

If, however, the partition ${\{A_{i}\}}_{i=0}^{\infty }$ is infinite, then we use the dominated convergence theorem to show that

$\operatorname {E} \left(\sum \limits _{i=0}^{n}XI_{A_{i}}\right)\to \operatorname {E} (X).$ Indeed, for every $n\geq 0$ ,

$\left|\sum _{i=0}^{n}XI_{A_{i}}\right|\leq |X|I_{\mathop {\bigcup } \limits _{i=0}^{n}A_{i}}\leq |X|.$ Since every element of the set $\Omega$ falls into a specific partition $A_{i}$ , it is straightforward to verify that the sequence ${\left\{\sum _{i=0}^{n}XI_{A_{i}}\right\}}_{n=0}^{\infty }$ converges pointwise to $X$ . By initial assumption, $\operatorname {E} |X|<\infty$ . Applying the dominated convergence theorem yields the desired.