# Continuous mapping theorem

In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps convergent sequences into convergent sequences: if xnx then g(xn) → g(x). The continuous mapping theorem states that this will also be true if we replace the deterministic sequence {xn} with a sequence of random variables {Xn}, and replace the standard notion of convergence of real numbers “→” with one of the types of convergence of random variables.

This theorem was first proved by Henry Mann and Abraham Wald in 1943,[1] and it is therefore sometimes called the Mann–Wald theorem.[2] Meanwhile, Denis Sargan refers to it as the general transformation theorem.[3]

## Statement

Let {Xn}, X be random elements defined on a metric space S. Suppose a function g: SS′ (where S′ is another metric space) has the set of discontinuity points Dg such that Pr[X ∈ Dg] = 0. Then[4][5]

{\displaystyle {\begin{aligned}X_{n}\ {\xrightarrow {\text{d}}}\ X\quad &\Rightarrow \quad g(X_{n})\ {\xrightarrow {\text{d}}}\ g(X);\\[6pt]X_{n}\ {\xrightarrow {\text{p}}}\ X\quad &\Rightarrow \quad g(X_{n})\ {\xrightarrow {\text{p}}}\ g(X);\\[6pt]X_{n}\ {\xrightarrow {\!\!{\text{a.s.}}\!\!}}\ X\quad &\Rightarrow \quad g(X_{n})\ {\xrightarrow {\!\!{\text{a.s.}}\!\!}}\ g(X).\end{aligned}}}

where the superscripts, "d", "p", and "a.s." denote convergence in distribution, convergence in probability, and almost sure convergence respectively.

## Proof

This proof has been adopted from (van der Vaart 1998, Theorem 2.3)

Spaces S and S′ are equipped with certain metrics. For simplicity we will denote both of these metrics using the |x − y| notation, even though the metrics may be arbitrary and not necessarily Euclidean.

### Convergence in distribution

We will need a particular statement from the portmanteau theorem: that convergence in distribution ${\displaystyle X_{n}{\xrightarrow {d}}X}$ is equivalent to

${\displaystyle \mathbb {E} f(X_{n})\to \mathbb {E} f(X)}$ for every bounded continuous functional f.

So it suffices to prove that ${\displaystyle \mathbb {E} f(g(X_{n}))\to \mathbb {E} f(g(X))}$ for every bounded continuous functional f. Note that ${\displaystyle F=f\circ g}$ is itself a bounded continuous functional. And so the claim follows from the statement above.

### Convergence in probability

Fix an arbitrary ε > 0. Then for any δ > 0 consider the set Bδ defined as

${\displaystyle B_{\delta }={\big \{}x\in S\mid x\notin D_{g}:\ \exists y\in S:\ |x-y|<\delta ,\,|g(x)-g(y)|>\varepsilon {\big \}}.}$

This is the set of continuity points x of the function g(·) for which it is possible to find, within the δ-neighborhood of x, a point which maps outside the ε-neighborhood of g(x). By definition of continuity, this set shrinks as δ goes to zero, so that limδ → 0Bδ = ∅.

Now suppose that |g(X) − g(Xn)| > ε. This implies that at least one of the following is true: either |XXn| ≥ δ, or X ∈ Dg, or XBδ. In terms of probabilities this can be written as

${\displaystyle \Pr {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}\leq \Pr {\big (}|X_{n}-X|\geq \delta {\big )}+\Pr(X\in B_{\delta })+\Pr(X\in D_{g}).}$

On the right-hand side, the first term converges to zero as n → ∞ for any fixed δ, by the definition of convergence in probability of the sequence {Xn}. The second term converges to zero as δ → 0, since the set Bδ shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore, the conclusion is that

${\displaystyle \lim _{n\to \infty }\Pr {\big (}{\big |}g(X_{n})-g(X){\big |}>\varepsilon {\big )}=0,}$

which means that g(Xn) converges to g(X) in probability.

### Almost sure convergence

By definition of the continuity of the function g(·),

${\displaystyle \lim _{n\to \infty }X_{n}(\omega )=X(\omega )\quad \Rightarrow \quad \lim _{n\to \infty }g(X_{n}(\omega ))=g(X(\omega ))}$

at each point X(ω) where g(·) is continuous. Therefore,

{\displaystyle {\begin{aligned}\Pr \left(\lim _{n\to \infty }g(X_{n})=g(X)\right)&\geq \Pr \left(\lim _{n\to \infty }g(X_{n})=g(X),\ X\notin D_{g}\right)\\&\geq \Pr \left(\lim _{n\to \infty }X_{n}=X,\ X\notin D_{g}\right)=1,\end{aligned}}}

because the intersection of two almost sure events is almost sure.

By definition, we conclude that g(Xn) converges to g(X) almost surely.

5. ^ Van der Vaart, A. W. (1998). Asymptotic Statistics. New York: Cambridge University Press. p. 7 (Theorem 2.3). ISBN 0-521-49603-9.{{cite book}}: CS1 maint: ref duplicates default (link)