Conditional expectation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution. It is also known as conditional expected value or conditional mean.

The concept of conditional expectation is important in Kolmogorov's measure-theoretic definition of probability theory. The concept of conditional probability is defined in terms of conditional expectation.

Introduction[edit]

Let X and Y be discrete random variables, then the conditional expectation of X given the event Y=y is a function of y over the range of Y

 \operatorname{E} (X | Y=y ) = \sum_{x \in \mathcal{X}} x \ \operatorname{P}(X=x|Y=y) = \sum_{x \in \mathcal{X}} x \ \frac{\operatorname{P}(X=x,Y=y)}{\operatorname{P}(Y=y)},

where \mathcal{X} is the range of X.

If now X is a continuous random variable, while Y remains a discrete variable, the conditional expectation is:

 \operatorname{E} (X | Y=y )= \int_{\mathcal{X}} x f_X (x |Y=y) dx

where  f_X (\,\cdot\, |Y=y) is the conditional density of X given Y=y.

A problem arises when Y is continuous. In this case, the probability P(Y=y) = 0, and the Borel–Kolmogorov paradox demonstrates the ambiguity of attempting to define conditional probability along these lines.

However the above expression may be rearranged:

 \operatorname{E} (X | Y=y) \operatorname{P}(Y=y) = \sum_{x \in \mathcal{X}} x \ \operatorname{P}(X=x,Y=y),

and although this is trivial for individual values of y (since both sides are zero), it should hold for any measurable subset B of the domain of Y that:

 \int_B \operatorname{E} (X | Y=y) \operatorname{P}(Y=y) \ \operatorname{d}y = \int_B \sum_{x \in \mathcal{X}} x \ \operatorname{P}(X=x,Y=y) \ \operatorname{d}y.

In fact, this is a sufficient condition to define both conditional expectation and conditional probability.

Formal definition[edit]

Let \scriptstyle (\Omega, \mathcal {F}, \operatorname {P} ) be a probability space, with a random variable \scriptstyle X:\Omega \to \mathbb{R}^n and a sub-σ-algebra \scriptstyle \mathcal {H} \subseteq \mathcal {F} .

Then a conditional expectation of X given \scriptstyle \mathcal {H} (denoted as \scriptstyle \operatorname{E}\left[X|\mathcal {H} \right]) is any \scriptstyle \mathcal {H} -measurable function (\Omega \to \mathbb{R}^n) which satisfies:

 \int_H \operatorname{E}\left[X|\mathcal {H} \right] (\omega) \ \operatorname{d} \operatorname{P}(\omega) = \int_H X(\omega) \ \operatorname{d} \operatorname{P}(\omega)  \qquad \text{for each} \quad H \in \mathcal {H} .[1]

Note that \scriptstyle \operatorname{E}\left[X|\mathcal {H} \right] is simply the name of the conditional expectation function.

Discussion[edit]

A couple of points worth noting about the definition:

  • This is not a constructive definition; we are merely given the required property that a conditional expectation must satisfy.
    • The required property has the same form as the last expression in the Introduction section.
    • Existence of a conditional expectation function is determined by the Radon–Nikodym theorem, a sufficient condition is that the (unconditional) expected value for X exist.
    • Uniqueness can be shown to be almost sure: that is, versions of the same conditional expectation will only differ on a set of probability zero.
  • The σ-algebra \scriptstyle \mathcal {H} controls the "granularity" of the conditioning. A conditional expectation \scriptstyle{E}\left[X|\mathcal {H} \right] over a finer-grained σ-algebra \scriptstyle \mathcal {H} will allow us to condition on a wider variety of events.
    • To condition freely on values of a random variable Y with state space \scriptstyle (\mathcal Y, \Sigma) , it suffices to define the conditional expectation using the pre-image of Σ with respect to Y, so that \scriptstyle \operatorname{E}\left[X| Y\right] is defined to be \scriptstyle \operatorname{E}\left[X|\mathcal {H} \right], where
 \mathcal {H} = \sigma(Y):= Y^{-1}\left(\Sigma\right):= \{Y^{-1}(S) : S \in \Sigma \}
This suffices to ensure that the conditional expectation is σ(Y)-measurable. Although conditional expectation is defined to condition on events in the underlying probability space Ω, the requirement that it be σ(Y)-measurable allows us to condition on Y as in the introduction.

Definition of conditional probability[edit]

For any event A \in \mathcal{A} \supseteq \mathcal B, define the indicator function:

\mathbf{1}_A (\omega) = \begin{cases} 1 \; &\text{if } \omega \in A, \\ 0 \; &\text{if } \omega \notin A, \end{cases}

which is a random variable with respect to the Borel σ-algebra on (0,1). Note that the expectation of this random variable is equal to the probability of A itself:

\operatorname{E}(\mathbf{1}_A) = \operatorname{P}(A). \;

Then the conditional probability given \scriptstyle \mathcal B is a function \scriptstyle \operatorname{P}(\cdot|\mathcal{B}):\mathcal{A} \times \Omega \to (0,1) such that \scriptstyle \operatorname{P}(A|\mathcal{B}) is the conditional expectation of the indicator function for A:

\operatorname{P}(A|\mathcal{B}) = \operatorname{E}(\mathbf{1}_A|\mathcal{B}) \;

In other words, \scriptstyle \operatorname{P}(A|\mathcal{B}) is a \scriptstyle \mathcal B-measurable function satisfying

\int_B \operatorname{P}(A|\mathcal{B}) (\omega) \, \operatorname{d} \operatorname{P}(\omega) = \operatorname{P} (A \cap B) \qquad \text{for all} \quad A \in \mathcal{A}, B \in  \mathcal{B}.

A conditional probability is regular if \scriptstyle \operatorname{P}(\cdot|\mathcal{B})(\omega) is also a probability measure for all ω ∈ Ω. An expectation of a random variable with respect to a regular conditional probability is equal to its conditional expectation.

  • For the trivial sigma algebra \mathcal B= \{\emptyset,\Omega\} the conditional probability is a constant function, \operatorname{P}\!\left( A| \{\emptyset,\Omega\} \right) \equiv\operatorname{P}(A).
  • For A\in \mathcal{B}, as outlined above, \operatorname{P}(A|\mathcal{B})=1_A..

See also conditional probability distribution.

Conditioning as factorization[edit]

In the definition of conditional expectation that we provided above, the fact that Y is a real random variable is irrelevant: Let U be a measurable space, that is, a set equipped with a σ-algebra \Sigma of subsets. A U-valued random variable is a function Y\colon (\Omega,\mathcal A) \mapsto (U,\Sigma) such that Y^{-1}(B)\in \mathcal A for any measurable subset B\in \Sigma of U.

We consider the measure Q on U given as above: Q(B) = P(Y−1(B)) for every measurable subset B of U. Then Q is a probability measure on the measurable space U defined on its σ-algebra of measurable sets.

Theorem. If X is an integrable random variable on Ω then there is one and, up to equivalence a.e. relative to Q, only one integrable function g on U, which is written g= \operatorname{E}(X \mid Y) or g(u)= \operatorname{E}(X \mid Y=u), such that for any measurable subset B of U:

 \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} g(u) \ d \operatorname{Q} (u).

There are a number of ways of proving this; one as suggested above, is to note that the expression on the left hand side defines, as a function of the set B, a countably additive signed measure μ on the measurable subsets of U. Moreover, this measure μ is absolutely continuous relative to Q. Indeed Q(B) = 0 means exactly that Y−1(B) has probability 0. The integral of an integrable function on a set of probability 0 is itself 0. This proves absolute continuity. Then the Radon–Nikodym theorem provides the function g, equal to the density of μ with respect to Q.

The defining condition of conditional expectation then is the equation

 \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} \operatorname{E}(X \mid Y=u) \ d \operatorname{Q} (u),

and it holds that

\operatorname{E}(X \mid Y) \circ Y= \operatorname{E}\left(X \mid Y^{-1} \left(\Sigma\right)\right).

We can further interpret this equality by considering the abstract change of variables formula to transport the integral on the right hand side to an integral over Ω:

 \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{Y^{-1}(B)} (\operatorname{E}(X \mid Y) \circ Y)(\omega) \ d \operatorname{P} (\omega).

The equation means that the integrals of X and the composition \operatorname{E}(X \mid Y=\ \cdot)\circ Y over sets of the form Y−1(B), for B a measurable subset of U, are identical.

This equation can be interpreted to say that the following diagram is commutative in the average.

A diagram, commutative in an average sense.


Conditioning relative to a subalgebra[edit]

There is another viewpoint for conditioning involving σ-subalgebras N of the σ-algebra M. This version is a trivial specialization of the preceding: we simply take U to be the space Ω with the σ-algebra N and Y the identity map. We state the result:

Theorem. If X is an integrable real random variable on Ω then there is one and, up to equivalence a.e. relative to P, only one integrable function g such that for any set B belonging to the subalgebra N

 \int_{B} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} g(\omega) \ d \operatorname{P} (\omega)

where g is measurable with respect to N (a stricter condition than the measurability with respect to M required of X). This form of conditional expectation is usually written: E(X | N). This version is preferred by probabilists. One reason is that on the Hilbert space of square-integrable real random variables (in other words, real random variables with finite second moment) the mapping X → E(X | N) is self-adjoint

\operatorname E(X\cdot\operatorname E(Y\mid N)) = \operatorname E\left(\operatorname E(X\mid N)\cdot \operatorname E(Y\mid N)\right) = \operatorname E(\operatorname E(X\mid N)\cdot Y)

and a projection (i.e. idempotent)

 L^2_{\operatorname{P}}(\Omega;M) \rightarrow L^2_{\operatorname{P}}(\Omega;N).

Basic properties[edit]

Let (Ω, M, P) be a probability space, and let N be a σ-subalgebra of M.

  • Conditioning with respect to N  is linear on the space of integrable real random variables.
  • \operatorname{E}(1\mid N) = 1. More generally, \operatorname{E} (Y\mid N)= Y for every integrable N–measurable random variable Y on Ω.
  • \operatorname{E}(1_B \,\operatorname{E} (X\mid N))= \operatorname{E}(1_B \, X)   for all B ∈ N and every integrable random variable X on Ω.
 f(\operatorname{E}(X \mid N) ) \leq  \operatorname{E}(f \circ X \mid N).
  • Conditioning is a contractive projection
 L^s_P(\Omega; M) \rightarrow L^s_P(\Omega; N), \text{ i.e. } \operatorname{E}|\operatorname{E}(X\mid N)|^s \le \operatorname{E}|X|^s
for any s ≥ 1.

See also[edit]

Notes[edit]

References[edit]

External links[edit]