# Hoeffding's lemma

In probability theory, Hoeffding's lemma is an inequality that bounds the moment-generating function of any bounded random variable.[1] It is named after the FinnishAmerican mathematical statistician Wassily Hoeffding.

The proof of Hoeffding's lemma uses Taylor's theorem and Jensen's inequality. Hoeffding's lemma is itself used in the proof of McDiarmid's inequality.

## Statement of the lemma

Let X be any real-valued random variable with expected value ${\displaystyle \mathbb {E} [X]=0}$ and such that ${\displaystyle a\leq X\leq b}$ almost surely. Then, for all ${\displaystyle \lambda \in \mathbb {R} }$,

${\displaystyle \mathbb {E} \left[e^{\lambda X}\right]\leq \exp \left({\frac {\lambda ^{2}(b-a)^{2}}{8}}\right).}$

Note that because of the assumption that the random variable ${\displaystyle X}$ has zero expectation, the ${\displaystyle a}$ and ${\displaystyle b}$ in the lemma must satisfy ${\displaystyle a\leq 0\leq b}$.

## A brief proof of the lemma

Since ${\displaystyle e^{\lambda x}}$ is a convex function of ${\displaystyle x}$, we have

${\displaystyle e^{\lambda x}\leq {\frac {b-x}{b-a}}e^{\lambda a}+{\frac {x-a}{b-a}}e^{\lambda b}\qquad \forall a\leq x\leq b}$

So, ${\displaystyle \mathbb {E} \left[e^{\lambda X}\right]\leq {\frac {b-\mathbb {E} [X]}{b-a}}e^{\lambda a}+{\frac {\mathbb {E} [X]-a}{b-a}}e^{\lambda b}.}$

Let ${\displaystyle h=\lambda (b-a)}$, ${\displaystyle p={\frac {-a}{b-a}}}$ and ${\displaystyle L(h)=-hp+\ln(1-p+pe^{h})}$

Then, ${\displaystyle {\frac {b-\mathbb {E} [X]}{b-a}}e^{\lambda a}+{\frac {\mathbb {E} [X]-a}{b-a}}e^{\lambda b}=e^{L(h)}}$ since ${\displaystyle \mathbb {E} [X]=0}$

Taking derivative of ${\displaystyle L(h)}$,

${\displaystyle L(0)=L^{'}(0)=0{\text{ and }}L^{''}(h)\leq {\frac {1}{4}}}$ for all h.

By Taylor's expansion,

${\displaystyle L(h)\leq {\frac {1}{8}}h^{2}={\frac {1}{8}}\lambda ^{2}(b-a)^{2}}$

Hence, ${\displaystyle \mathbb {E} \left[e^{\lambda X}\right]\leq e^{{\frac {1}{8}}\lambda ^{2}(b-a)^{2}}}$

(The proof below is the same proof with more explanation.)

## More detailed proof

First note that if one of ${\displaystyle a}$ or ${\displaystyle b}$ is zero, then ${\displaystyle \textstyle \mathbb {P} \left(X=0\right)=1}$ and the inequality follows. If both are nonzero, then ${\displaystyle a}$ must be negative and ${\displaystyle b}$ must be positive.

Next, recall that ${\displaystyle e^{sx}}$ is a convex function on the real line:

${\displaystyle \forall x\in [a,b]:\qquad e^{sx}\leq {\frac {b-x}{b-a}}e^{sa}+{\frac {x-a}{b-a}}e^{sb}.}$

Applying ${\displaystyle \mathbb {E} }$ to both sides of the above inequality gives us:

{\displaystyle {\begin{aligned}\mathbb {E} \left[e^{sX}\right]&\leq {\frac {b-\mathbb {E} [X]}{b-a}}e^{sa}+{\frac {\mathbb {E} [X]-a}{b-a}}e^{sb}\\&={\frac {b}{b-a}}e^{sa}+{\frac {-a}{b-a}}e^{sb}&&\mathbb {E} (X)=0\\&=(1-\theta )e^{sa}+\theta e^{sb}&&\theta =-{\frac {a}{b-a}}>0\\&=e^{sa}\left(1-\theta +\theta e^{s(b-a)}\right)\\&=\left(1-\theta +\theta e^{s(b-a)}\right)e^{-s\theta (b-a)}\\\end{aligned}}}

Let ${\displaystyle u=s(b-a)}$ and define:

${\displaystyle {\begin{cases}\varphi :\mathbb {R} \to \mathbb {R} \\\varphi (u)=-\theta u+\log \left(1-\theta +\theta e^{u}\right)\end{cases}}}$

${\displaystyle \varphi }$ is well defined on ${\displaystyle \mathbb {R} }$, to see this we calculate:

{\displaystyle {\begin{aligned}1-\theta +\theta e^{u}&=\theta \left({\frac {1}{\theta }}-1+e^{u}\right)\\&=\theta \left(-{\frac {b}{a}}+e^{u}\right)\\&>0&&\theta >0,\quad {\frac {b}{a}}<0\end{aligned}}}

The definition of ${\displaystyle \varphi }$ implies

${\displaystyle \mathbb {E} \left[e^{sX}\right]\leq e^{\varphi (u)}.}$

By Taylor's theorem, for every real ${\displaystyle u}$ there exists a ${\displaystyle v}$ between ${\displaystyle 0}$ and ${\displaystyle u}$ such that

${\displaystyle \varphi (u)=\varphi (0)+u\varphi '(0)+{\tfrac {1}{2}}u^{2}\varphi ''(v).}$

Note that:

{\displaystyle {\begin{aligned}\varphi (0)&=0\\\varphi '(0)&=-\theta +\left.{\frac {\theta e^{u}}{1-\theta +\theta e^{u}}}\right|_{u=0}\\&=0\\[6pt]\varphi ''(v)&={\frac {\theta e^{v}\left(1-\theta +\theta e^{v}\right)-\theta ^{2}e^{2v}}{\left(1-\theta +\theta e^{v}\right)^{2}}}\\[6pt]&={\frac {\theta e^{v}}{1-\theta +\theta e^{v}}}\left(1-{\frac {\theta e^{v}}{1-\theta +\theta e^{v}}}\right)\\[6pt]&=t(1-t)&&t={\frac {\theta e^{v}}{1-\theta +\theta e^{v}}}\\&\leq {\tfrac {1}{4}}&&t>0\end{aligned}}}

Therefore,

${\displaystyle \varphi (u)\leq 0+u\cdot 0+{\tfrac {1}{2}}u^{2}\cdot {\tfrac {1}{4}}={\tfrac {1}{8}}u^{2}={\tfrac {1}{8}}s^{2}(b-a)^{2}.}$

This implies

${\displaystyle \mathbb {E} \left[e^{sX}\right]\leq \exp \left({\tfrac {1}{8}}s^{2}(b-a)^{2}\right).}$