= Proofs of convergence of random variables =

This article is supplemental for “Convergence of random variables” and provides proofs for selected results.

Several results will be established using the portmanteau lemma: A sequence {X_{n}} converges in distribution to X if and only if any of the following conditions are met:

  <li> $\mathbb{E}[f(X_n)]\to\mathbb{E}[f(X)]$ for all bounded, continuous functions $f$;
  <li> $\mathbb{E}[f(X_n)]\to\mathbb{E}[f(X)]$ for all bounded, Lipschitz functions $f$;
  <li> $\limsup\operatorname{Pr}(X_n\in C)\leq\operatorname{Pr}(X\in C)$ for all closed sets $C$;

== Convergence almost surely implies convergence in probability==
 $X_n\ \overset\mathrm{as}\rightarrow\ X \quad\Rightarrow\quad X_n\ \overset{p}\rightarrow\ X$
Proof: If $\{X_n\}$ converges to $X$ almost surely, it means that the set of points $O=\{\omega\mid\lim X_n(\omega)\neq X(\omega)\}$ has measure zero. Now fix $\varepsilon > 0$ and consider a sequence of sets
 $A_n = \bigcup_{m\geq n} \left \{ \left |X_m-X \right |>\varepsilon \right\}$

This sequence of sets is decreasing ($A_n\supseteq A_{n+1}\supseteq\ldots$) towards the set

$A_{\infty} = \bigcap_{n \geq 1} A_n.$

The probabilities of this sequence are also decreasing, so $\lim\operatorname{Pr}(A_n)=\operatorname{Pr}(A_\infty)$; we shall show now that this number is equal to zero. Now for any point $\omega$ outside of $O$ we have $\lim X_n(\omega)=X(\omega)$, which implies that $\left| X_n(\omega) - X(\omega) \right| < \varepsilon$ for all $n \geq N$ for some $N$. In particular, for such $N$ the point $\omega$ will not lie in $A_N$, and hence won't lie in $A_\infty$. Therefore, $A_\infty\subseteq O$ and so $\operatorname{Pr}(A_\infty)=0$.

Finally, by continuity from above,
 $\operatorname{Pr}\left(|X_n-X|>\varepsilon\right) \leq \operatorname{Pr}(A_n) \ \underset{n\to\infty}{\rightarrow} 0,$
which by definition means that $X_n$ converges in probability to $X$.

== Convergence in probability does not imply almost sure convergence in the discrete case==
If X_{n} are independent random variables assuming value one with probability 1/n and zero otherwise, then X_{n} converges to zero in probability but not almost surely. This can be verified using the Borel–Cantelli lemmas.

== Convergence in probability implies convergence in distribution==
 $X_n\ \xrightarrow{p}\ X \quad\Rightarrow\quad X_n\ \xrightarrow{d}\ X,$

===Proof for the case of scalar random variables===
Lemma. Let X, Y be random variables, let a be a real number and ε > 0. Then
 $\operatorname{Pr}(Y \leq a) \leq \operatorname{Pr}(X\leq a+\varepsilon) + \operatorname{Pr}(|Y - X| > \varepsilon).$

Proof of lemma:
 $\begin{align}
\operatorname{Pr}(Y\leq a) &= \operatorname{Pr}(Y\leq a,\ X\leq a+\varepsilon) + \operatorname{Pr}(Y\leq a,\ X>a+\varepsilon) \\
      &\leq \operatorname{Pr}(X\leq a+\varepsilon) + \operatorname{Pr}(Y-X\leq a-X,\ a-X<-\varepsilon) \\
      &\leq \operatorname{Pr}(X\leq a+\varepsilon) + \operatorname{Pr}(Y-X<-\varepsilon) \\
      &\leq \operatorname{Pr}(X\leq a+\varepsilon) + \operatorname{Pr}(Y-X<-\varepsilon) + \operatorname{Pr}(Y-X>\varepsilon)\\
      &= \operatorname{Pr}(X\leq a+\varepsilon) + \operatorname{Pr}(|Y-X|>\varepsilon)
  \end{align}$

Shorter proof of the lemma:

We have
 $\begin{align}
\{Y \leq a\}\subset\{X \leq a + \varepsilon\}\cup \{|Y-X|>\varepsilon\}
\end{align}$

for if $Y\leq a$ and $|Y-X|\leq \varepsilon$, then $X\leq a+\varepsilon$. Hence by the union bound,
 $\begin{align}
\operatorname{Pr}(Y\leq a) \leq \operatorname{Pr}(X \leq a + \varepsilon) + \operatorname{Pr}(|Y-X|>\varepsilon).
\end{align}$

Proof of the theorem: Recall that in order to prove convergence in distribution, one must show that the sequence of cumulative distribution functions converges to the F_{X} at every point where F_{X} is continuous. Let a be such a point. For every ε > 0, due to the preceding lemma, we have:
 $\begin{align}
\operatorname{Pr}(X_n\leq a) &\leq \operatorname{Pr}(X\leq a+\varepsilon) + \operatorname{Pr}(|X_n-X|>\varepsilon) \\
\operatorname{Pr}(X\leq a-\varepsilon)&\leq \operatorname{Pr}(X_n\leq a) + \operatorname{Pr}(|X_n-X|>\varepsilon)
\end{align}$

So, we have
 $\operatorname{Pr}(X\leq a-\varepsilon) - \operatorname{Pr} \left (\left |X_n-X \right |>\varepsilon \right ) \leq \operatorname{Pr}(X_n\leq a) \leq \operatorname{Pr}(X\leq a+\varepsilon) + \operatorname{Pr} \left (\left |X_n-X \right |>\varepsilon \right ).$

Taking the limit as n → ∞, we obtain:
 $F_X(a-\varepsilon) \leq \lim_{n\to\infty} \operatorname{Pr}(X_n\leq a) \leq F_X(a+\varepsilon),$
where F_{X}(a) = Pr(X ≤ a) is the cumulative distribution function of X. This function is continuous at a by assumption, and therefore both F_{X}(a−ε) and F_{X}(a+ε) converge to F_{X}(a) as ε → 0^{+}. Taking this limit, we obtain
 $\lim_{n\to\infty} \operatorname{Pr}(X_n \leq a) = \operatorname{Pr}(X \leq a),$
which means that {X_{n}} converges to X in distribution.

===Proof for the generic case===
The implication follows for when X_{n} is a random vector by using this property proved later on this page and by taking X_{n} = X in the statement of that property.

== Convergence in distribution to a constant implies convergence in probability==
 $X_n\ \xrightarrow{d}\ c \quad\Rightarrow\quad X_n\ \xrightarrow{p}\ c,$ provided c is a constant.

Proof: Fix ε > 0. Let B_{ε}(c) be the open ball of radius ε around point c, and B_{ε}(c)^{c} its complement. Then
 $\operatorname{Pr}\left(|X_n-c|\geq\varepsilon\right) = \operatorname{Pr}\left(X_n\in B_\varepsilon(c)^c\right).$
By the portmanteau lemma (part C), if X_{n} converges in distribution to c, then the limsup of the latter probability must be less than or equal to Pr(c ∈ B_{ε}(c)^{c}), which is obviously equal to zero. Therefore,

 $\begin{align}
\lim_{n\to\infty}\operatorname{Pr}\left( \left |X_n-c \right |\geq\varepsilon\right) &\leq \limsup_{n\to\infty}\operatorname{Pr}\left( \left |X_n-c \right | \geq \varepsilon \right) \\
&= \limsup_{n\to\infty}\operatorname{Pr}\left(X_n\in B_\varepsilon(c)^c\right) \\
&\leq \operatorname{Pr}\left(c\in B_\varepsilon(c)^c\right) = 0
\end{align}$

which by definition means that X_{n} converges to c in probability.

== Convergence in probability to a sequence converging in distribution implies convergence to the same distribution==
 $|Y_n-X_n|\ \xrightarrow{p}\ 0,\ \ X_n\ \xrightarrow{d}\ X\ \quad\Rightarrow\quad Y_n\ \xrightarrow{d}\ X$

Proof: We will prove this theorem using the portmanteau lemma, part B. As required in that lemma, consider any bounded function f (i.e. |f(x)| ≤ M) which is also Lipschitz:

 $\exists K >0, \forall x,y: \quad |f(x)-f(y)|\leq K|x-y|.$

Take some ε > 0 and majorize the expression |E[f(Y_{n})] − E[f(X_{n})]| as

 $\begin{align}
\left|\operatorname{E}\left[f(Y_n)\right] - \operatorname{E}\left [f(X_n) \right] \right| &\leq \operatorname{E} \left [\left |f(Y_n) - f(X_n) \right | \right ]\\
&= \operatorname{E}\left[ \left |f(Y_n) - f(X_n) \right |\mathbf{1}_{\left \{|Y_n-X_n|<\varepsilon \right \}} \right] + \operatorname{E}\left[ \left |f(Y_n) - f(X_n) \right |\mathbf{1}_{\left \{|Y_n-X_n|\geq\varepsilon \right \}} \right] \\
&\leq \operatorname{E}\left[K \left |Y_n - X_n \right |\mathbf{1}_{\left \{|Y_n-X_n|<\varepsilon \right \}}\right] + \operatorname{E}\left[2M\mathbf{1}_{\left \{|Y_n-X_n|\geq\varepsilon \right \}}\right] \\
&\leq K \varepsilon \operatorname{Pr} \left (\left |Y_n-X_n \right |<\varepsilon\right) + 2M \operatorname{Pr} \left( \left |Y_n-X_n \right |\geq\varepsilon\right )\\
&\leq K \varepsilon + 2M \operatorname{Pr} \left (\left |Y_n-X_n \right |\geq\varepsilon \right )
\end{align}$

(here 1_{{...}} denotes the indicator function; the expectation of the indicator function is equal to the probability of corresponding event). Therefore,
 $\begin{align}
\left |\operatorname{E}\left [f(Y_n)\right ] - \operatorname{E}\left [f(X) \right ]\right | &\leq \left|\operatorname{E}\left[ f(Y_n) \right ]-\operatorname{E} \left [f(X_n) \right ] \right| + \left|\operatorname{E}\left [f(X_n) \right ]-\operatorname{E}\left [f(X) \right] \right| \\
    &\leq K\varepsilon + 2M \operatorname{Pr}\left (|Y_n-X_n|\geq\varepsilon\right )+ \left |\operatorname{E}\left[ f(X_n) \right]-\operatorname{E} \left [f(X) \right ]\right|.
  \end{align}$
If we take the limit in this expression as n → ∞, the second term will go to zero since {Y_{n}−X_{n}} converges to zero in probability; and the third term will also converge to zero, by the portmanteau lemma and the fact that X_{n} converges to X in distribution. Thus
 $\lim_{n\to\infty} \left|\operatorname{E}\left [f(Y_n) \right] - \operatorname{E}\left [f(X) \right ] \right| \leq K\varepsilon.$
Since ε was arbitrary, we conclude that the limit must in fact be equal to zero, and therefore E[f(Y_{n})] → E[f(X)], which again by the portmanteau lemma implies that {Y_{n}} converges to X in distribution. QED.

== Convergence of one sequence in distribution and another to a constant implies joint convergence in distribution==

 $X_n\ \xrightarrow{d}\ X,\ \ Y_n\ \xrightarrow{p}\ c\ \quad\Rightarrow\quad (X_n,Y_n)\ \xrightarrow{d}\ (X,c)$ provided c is a constant.

Proof: We will prove this statement using the portmanteau lemma, part A.

First we want to show that (X_{n}, c) converges in distribution to (X, c). By the portmanteau lemma this will be true if we can show that E[f(X_{n}, c)] → E[f(X, c)] for any bounded continuous function f(x, y). So let f be such arbitrary bounded continuous function. Now consider the function of a single variable g(x) := f(x, c). This will obviously be also bounded and continuous, and therefore by the portmanteau lemma for sequence {X_{n}} converging in distribution to X, we will have that E[g(X_{n})] → E[g(X)]. However the latter expression is equivalent to “E[f(X_{n}, c)] → E[f(X, c)]”, and therefore we now know that (X_{n}, c) converges in distribution to (X, c).

Secondly, consider |(X_{n}, Y_{n}) − (X_{n}, c)| = |Y_{n} − c|. This expression converges in probability to zero because Y_{n} converges in probability to c. Thus we have demonstrated two facts:
 $\begin{cases}
    \left| (X_n, Y_n) - (X_n,c) \right|\ \xrightarrow{p}\ 0, \\
    (X_n,c)\ \xrightarrow{d}\ (X,c).
  \end{cases}$
By the property proved earlier, these two facts imply that (X_{n}, Y_{n}) converge in distribution to (X, c).

== Convergence of two sequences in probability implies joint convergence in probability==
 $X_n\ \xrightarrow{p}\ X,\ \ Y_n\ \xrightarrow{p}\ Y\ \quad\Rightarrow\quad (X_n,Y_n)\ \xrightarrow{p}\ (X,Y)$

Proof:
 $\begin{align}
\operatorname{Pr}\left(\left|(X_n,Y_n)-(X,Y)\right|\geq\varepsilon\right) &\leq \operatorname{Pr}\left(|X_n-X| + |Y_n-Y|\geq\varepsilon\right) \\
&\leq\operatorname{Pr}\left(|X_n-X|\geq\varepsilon/2\right) + \operatorname{Pr}\left(|Y_n-Y|\geq\varepsilon/2\right)
\end{align}$
where the last step follows by the pigeonhole principle and the sub-additivity of the probability measure. Each of the probabilities on the right-hand side converge to zero as n → ∞ by definition of the convergence of {X_{n}} and {Y_{n}} in probability to X and Y respectively. Taking the limit we conclude that the left-hand side also converges to zero, and therefore the sequence {(X_{n}, Y_{n})} converges in probability to {(X, Y)}.

==See also==
- Convergence of random variables
