# Hoeffding's lemma

Jump to navigation Jump to search

In probability theory, Hoeffding's lemma is an inequality that bounds the moment-generating function of any bounded random variable. It is named after the FinnishAmerican mathematical statistician Wassily Hoeffding.

The proof of Hoeffding's lemma uses Taylor's theorem and Jensen's inequality. Hoeffding's lemma is itself used in the proof of McDiarmid's inequality.

## Statement of the lemma

Let X be any real-valued random variable with expected value $\mathbb {E} [X]=0$ and such that $a\leq X\leq b$ almost surely. Then, for all $\lambda \in \mathbb {R}$ ,

$\mathbb {E} \left[e^{\lambda X}\right]\leq \exp \left({\frac {\lambda ^{2}(b-a)^{2}}{8}}\right).$ Note that because of the assumption that the random variable $X$ has zero expectation, the $a$ and $b$ in the lemma must satisfy $a\leq 0\leq b$ .

## A brief proof of the lemma

Since $e^{\lambda x}$ is a convex function of $x$ , we have

$e^{\lambda x}\leq {\frac {b-x}{b-a}}e^{\lambda a}+{\frac {x-a}{b-a}}e^{\lambda b}\qquad \forall a\leq x\leq b$ So, $\mathbb {E} \left[e^{\lambda X}\right]\leq {\frac {b-\mathbb {E} [X]}{b-a}}e^{\lambda a}+{\frac {\mathbb {E} [X]-a}{b-a}}e^{\lambda b}.$ Let $h=\lambda (b-a)$ , $p={\frac {-a}{b-a}}$ and $L(h)=-hp+\ln(1-p+pe^{h})$ Then, ${\frac {b-\mathbb {E} [X]}{b-a}}e^{\lambda a}+{\frac {\mathbb {E} [X]-a}{b-a}}e^{\lambda b}=e^{L(h)}$ since $\mathbb {E} [X]=0$ Taking derivative of $L(h)$ ,

$L(0)=L^{'}(0)=0{\text{ and }}L^{''}(h)\leq {\frac {1}{4}}$ for all h.

By Taylor's expansion,

$L(h)\leq {\frac {1}{8}}h^{2}={\frac {1}{8}}\lambda ^{2}(b-a)^{2}$ Hence, $\mathbb {E} \left[e^{\lambda X}\right]\leq e^{{\frac {1}{8}}\lambda ^{2}(b-a)^{2}}$ (The proof below is the same proof with more explanation.)

## More detailed proof

First note that if one of $a$ or $b$ is zero, then $\textstyle \mathbb {P} \left(X=0\right)=1$ and the inequality follows. If both are nonzero, then $a$ must be negative and $b$ must be positive.

Next, recall that $e^{sx}$ is a convex function on the real line:

$\forall x\in [a,b]:\qquad e^{sx}\leq {\frac {b-x}{b-a}}e^{sa}+{\frac {x-a}{b-a}}e^{sb}.$ Applying $\mathbb {E}$ to both sides of the above inequality gives us:

{\begin{aligned}\mathbb {E} \left[e^{sX}\right]&\leq {\frac {b-\mathbb {E} [X]}{b-a}}e^{sa}+{\frac {\mathbb {E} [X]-a}{b-a}}e^{sb}\\&={\frac {b}{b-a}}e^{sa}+{\frac {-a}{b-a}}e^{sb}&&\mathbb {E} (X)=0\\&=(1-\theta )e^{sa}+\theta e^{sb}&&\theta =-{\frac {a}{b-a}}>0\\&=e^{sa}\left(1-\theta +\theta e^{s(b-a)}\right)\\&=\left(1-\theta +\theta e^{s(b-a)}\right)e^{-s\theta (b-a)}\\\end{aligned}} Let $u=s(b-a)$ and define:

${\begin{cases}\varphi :\mathbb {R} \to \mathbb {R} \\\varphi (u)=-\theta u+\log \left(1-\theta +\theta e^{u}\right)\end{cases}}$ $\varphi$ is well defined on $\mathbb {R}$ , to see this we calculate:

{\begin{aligned}1-\theta +\theta e^{u}&=\theta \left({\frac {1}{\theta }}-1+e^{u}\right)\\&=\theta \left(-{\frac {b}{a}}+e^{u}\right)\\&>0&&\theta >0,\quad {\frac {b}{a}}<0\end{aligned}} The definition of $\varphi$ implies

$\mathbb {E} \left[e^{sX}\right]\leq e^{\varphi (u)}.$ By Taylor's theorem, for every real $u$ there exists a $v$ between $0$ and $u$ such that

$\varphi (u)=\varphi (0)+u\varphi '(0)+{\tfrac {1}{2}}u^{2}\varphi ''(v).$ Note that:

{\begin{aligned}\varphi (0)&=0\\\varphi '(0)&=-\theta +\left.{\frac {\theta e^{u}}{1-\theta +\theta e^{u}}}\right|_{u=0}\\&=0\\[6pt]\varphi ''(v)&={\frac {\theta e^{v}\left(1-\theta +\theta e^{v}\right)-\theta ^{2}e^{2v}}{\left(1-\theta +\theta e^{v}\right)^{2}}}\\[6pt]&={\frac {\theta e^{v}}{1-\theta +\theta e^{v}}}\left(1-{\frac {\theta e^{v}}{1-\theta +\theta e^{v}}}\right)\\[6pt]&=t(1-t)&&t={\frac {\theta e^{v}}{1-\theta +\theta e^{v}}}\\&\leq {\tfrac {1}{4}}&&t>0\end{aligned}} Therefore,

$\varphi (u)\leq 0+u\cdot 0+{\tfrac {1}{2}}u^{2}\cdot {\tfrac {1}{4}}={\tfrac {1}{8}}u^{2}={\tfrac {1}{8}}s^{2}(b-a)^{2}.$ This implies

$\mathbb {E} \left[e^{sX}\right]\leq \exp \left({\tfrac {1}{8}}s^{2}(b-a)^{2}\right).$ 