# Continuous mapping theorem

In probability theory, the continuous mapping theorem states that continuous functions are limit-preserving even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps convergent sequences into convergent sequences: if xnx then g(xn) → g(x). The continuous mapping theorem states that this will also be true if we replace the deterministic sequence {xn} with a sequence of random variables {Xn}, and replace the standard notion of convergence of real numbers “→” with one of the types of convergence of random variables.

This theorem was first proved by Mann & Wald (1943), and it is therefore sometimes called the Mann–Wald theorem.[1]

## Statement

Let {Xn}, X be random elements defined on a metric space S. Suppose a function g: SS′ (where S′ is another metric space) has the set of discontinuity points Dg such that Pr[X ∈ Dg] = 0. Then[2][3][4]

1. ${\displaystyle X_{n}\ {\xrightarrow {d}}\ X\quad \Rightarrow \quad g(X_{n})\ {\xrightarrow {d}}\ g(X);}$
2. ${\displaystyle X_{n}\ {\xrightarrow {p}}\ X\quad \Rightarrow \quad g(X_{n})\ {\xrightarrow {p}}\ g(X);}$
3. ${\displaystyle X_{n}\ {\xrightarrow {\!\!as\!\!}}\ X\quad \Rightarrow \quad g(X_{n})\ {\xrightarrow {\!\!as\!\!}}\ g(X).}$

## Proof

This proof has been adopted from (van der Vaart 1998, Theorem 2.3)

Spaces S and S′ are equipped with certain metrics. For simplicity we will denote both of these metrics using the |x−y| notation, even though the metrics may be arbitrary and not necessarily Euclidean.

### Convergence in distribution

We will need a particular statement from the portmanteau theorem: that convergence in distribution ${\displaystyle X_{n}{\xrightarrow {d}}X}$ is equivalent to

${\displaystyle \limsup _{n\to \infty }\operatorname {Pr} (X_{n}\in F)\leq \operatorname {Pr} (X\in F){\text{ for every closed set }}F.}$

Fix an arbitrary closed set FS′. Denote by g−1(F) the pre-image of F under the mapping g: the set of all points x ∈ S such that g(x)∈F. Consider a sequence {xk} such that g(xk) ∈ F and xk → x. Then this sequence lies in g−1(F), and its limit point x belongs to the closure of this set, g−1(F) (by definition of the closure). The point x may be either:

• a continuity point of g, in which case g(xk) → g(x), and hence g(x)∈F because F is a closed set, and therefore in this case x belongs to the pre-image of F, or
• a discontinuity point of g, so that x ∈ Dg.

Thus the following relationship holds:

${\displaystyle {\overline {g^{-1}(F)}}\ \subset \ g^{-1}(F)\cup D_{g}\ .}$

Consider the event {g(Xn)∈F}. The probability of this event can be estimated as

${\displaystyle \operatorname {Pr} {\big (}g(X_{n})\in F{\big )}=\operatorname {Pr} {\big (}X_{n}\in g^{-1}(F){\big )}\leq \operatorname {Pr} {\big (}X_{n}\in {\overline {g^{-1}(F)}}{\big )},}$

and by the portmanteau theorem the limsup of the last expression is less than or equal to Pr(X ∈ g−1(F)). Using the formula we derived in the previous paragraph, this can be written as

{\displaystyle {\begin{aligned}&\operatorname {Pr} {\big (}X\in {\overline {g^{-1}(F)}}{\big )}\leq \operatorname {Pr} {\big (}X\in g^{-1}(F)\cup D_{g}{\big )}\leq \\&\operatorname {Pr} {\big (}X\in g^{-1}(F){\big )}+\operatorname {Pr} (X\in D_{g})=\operatorname {Pr} {\big (}g(X)\in F{\big )}+0.\end{aligned}}}

On plugging this back into the original expression, it can be seen that

${\displaystyle \limsup _{n\to \infty }\Pr {\big (}g(X_{n})\in F{\big )}\leq \Pr {\big (}g(X)\in F{\big )},}$

which, by the portmanteau theorem, implies that g(Xn) converges to g(X) in distribution.

### Convergence in probability

Fix an arbitrary ε > 0. Then for any δ > 0 consider the set Bδ defined as

${\displaystyle B_{\delta }={\big \{}x\in S\mid x\notin D_{g}:\ \exists y\in S:\ |x-y|<\delta ,\,|g(x)-g(y)|>\varepsilon {\big \}}.}$

This is the set of continuity points x of the function g(·) for which it is possible to find, within the δ-neighborhood of x, a point which maps outside the ε-neighborhood of g(x). By definition of continuity, this set shrinks as δ goes to zero, so that limδ → 0Bδ = ∅.

Now suppose that |g(X) − g(Xn)| > ε. This implies that at least one of the following is true: either |XXn| ≥ δ, or X ∈ Dg, or XBδ. In terms of probabilities this can be written as

${\displaystyle \Pr {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}\leq \Pr {\big (}|X_{n}-X|\geq \delta {\big )}+\Pr(X\in B_{\delta })+\Pr(X\in D_{g}).}$

On the right-hand side, the first term converges to zero as n → ∞ for any fixed δ, by the definition of convergence in probability of the sequence {Xn}. The second term converges to zero as δ → 0, since the set Bδ shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore the conclusion is that

${\displaystyle \lim _{n\to \infty }\Pr {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}=0,}$

which means that g(Xn) converges to g(X) in probability.

### Convergence almost surely

By definition of the continuity of the function g(·),

${\displaystyle \lim _{n\to \infty }X_{n}(\omega )=X(\omega )\quad \Rightarrow \quad \lim _{n\to \infty }g(X_{n}(\omega ))=g(X(\omega ))}$

at each point X(ω) where g(·) is continuous. Therefore

{\displaystyle {\begin{aligned}\operatorname {Pr} {\Big (}\lim _{n\to \infty }g(X_{n})=g(X){\Big )}&\geq \operatorname {Pr} {\Big (}\lim _{n\to \infty }g(X_{n})=g(X),\ X\notin D_{g}{\Big )}\\&\geq \operatorname {Pr} {\Big (}\lim _{n\to \infty }X_{n}=X,\ X\notin D_{g}{\Big )}=1.\end{aligned}}},

because the intersection of two almost sure events is almost sure.

By definition, we conclude that g(Xn) converges to g(X) almost surely.