# Law of total expectation

The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations, the tower rule, the smoothing theorem, among other names, states that if X is an integrable random variable (i.e., a random variable satisfying E( | X | ) < ∞) and Y is any random variable, not necessarily integrable, on the same probability space, then

$\operatorname{E} (X) = \operatorname{E}_Y ( \operatorname{E}_{X \mid Y} ( X \mid Y)),$

i.e., the expected value of the conditional expected value of X given Y is the same as the expected value of X.

The nomenclature used here parallels the phrase law of total probability. See also law of total variance.

(The conditional expected value E( X | Y ) is a random variable in its own right, whose value depends on the value of Y. Notice that the conditional expected value of X given the event Y = y is a function of y (this is where adherence to the conventional, rigidly case-sensitive notation of probability theory becomes important!). If we write E( X | Y = y) = g(y) then the random variable E( X | Y ) is just g(Y).

One special case states that if $A_1, A_2, \ldots, A_n$ is a partition of the whole outcome space, i.e. these events are mutually exclusive and exhaustive, then

$\operatorname{E} (X) = \sum_{i=1}^{n}{\operatorname{E}(X \mid A_i) \operatorname{P}(A_i)}.$

## Proof in the discrete case

\begin{align} \operatorname{E} \left( \operatorname{E} (X \mid Y) \right) &{} = \operatorname{E} \Bigg[ \sum_x x \cdot \operatorname{P}(X=x \mid Y) \Bigg] \\[6pt] &{}=\sum_y \Bigg[ \sum_x x \cdot \operatorname{P}(X=x \mid Y=y) \Bigg] \cdot \operatorname{P}(Y=y) \\[6pt] &{}=\sum_y \sum_x x \cdot \operatorname{P}(X=x \mid Y=y) \cdot \operatorname{P}(Y=y) \\[6pt] &{}=\sum_x x \sum_y \operatorname{P}(X=x \mid Y=y) \cdot \operatorname{P}(Y=y) \\[6pt] &{}=\sum_x x \sum_y \operatorname{P}(X=x, Y=y) \\[6pt] &{}=\sum_x x \cdot \operatorname{P}(X=x) \\[6pt] &{}=\operatorname{E}(X). \end{align}

## Iterated expectations with nested conditioning sets

The following formulation of the law of iterated expectations plays an important role in many economic and finance models:

$\operatorname{E} (X \mid I_1) = \operatorname{E} ( \operatorname{E} ( X \mid I_2) \mid I_1),$

where the value of I1 is determined by that of I2. To build intuition, imagine an investor who forecasts a random stock price X based on the limited information set I1. The law of iterated expectations says that the investor can never gain a more precise forecast of X by conditioning on more specific information (I2), if the more specific forecast must itself be forecast with the original information (I1).

This formulation is often applied in a time series context, where Et denotes expectations conditional on only the information observed up to and including time period t. In typical models the information set t + 1 contains all information available through time t, plus additional information revealed at time t + 1. One can then write:[2]

$\operatorname{E}_t(X) = \operatorname{E}_t ( \operatorname{E}_{t+1} ( X )).$