# Stein's lemma

Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory.[1] The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

Note that the name "Stein's lemma" is also commonly used[2] to refer to a different result in the area of statistical hypothesis testing, which connects the error exponents in hypothesis testing with the Kullback–Leibler divergence. This result is also known as the Chernoff–Stein lemma[3] and is not related to the lemma discussed in this article.

## Statement of the lemma

Suppose X is a normally distributed random variable with expectation μ and variance σ2. Further suppose g is a differentiable function for which the two expectations E(g(X) (X − μ)) and E(g ′(X)) both exist. (The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its absolute value.) Then

${\displaystyle E{\bigl (}g(X)(X-\mu ){\bigr )}=\sigma ^{2}E{\bigl (}g'(X){\bigr )}.}$

In general, suppose X and Y are jointly normally distributed. Then

${\displaystyle \operatorname {Cov} (g(X),Y)=\operatorname {Cov} (X,Y)E(g'(X)).}$

For a general multivariate Gaussian random vector ${\displaystyle (X_{1},...,X_{n})\sim N(\mu ,\Sigma )}$ it follows that

${\displaystyle E{\bigl (}g(X)(X-\mu ){\bigr )}=\Sigma \cdot E{\bigl (}\nabla g(X){\bigr )}.}$

## Proof

The univariate probability density function for the univariate normal distribution with expectation 0 and variance 1 is

${\displaystyle \varphi (x)={1 \over {\sqrt {2\pi }}}e^{-x^{2}/2}}$

Since ${\displaystyle \int x\exp(-x^{2}/2)\,dx=-\exp(-x^{2}/2)}$ we get from integration by parts:

${\displaystyle E[g(X)X]={\frac {1}{\sqrt {2\pi }}}\int g(x)x\exp(-x^{2}/2)\,dx={\frac {1}{\sqrt {2\pi }}}\int g'(x)\exp(-x^{2}/2)\,dx=E[g'(X)]}$.

The case of general variance ${\displaystyle \sigma ^{2}}$ follows by substitution.

## More general statement

Isserlis' theorem is equivalently stated as

${\displaystyle \operatorname {E} (X_{1}f(X_{1},\ldots ,X_{n}))=\sum _{i=1}^{n}\operatorname {Cov} (X_{1}X_{i})\operatorname {E} (\partial _{X_{i}}f(X_{1},\ldots ,X_{n})).}$
where ${\displaystyle (X_{1},\dots X_{n})}$ is a zero-mean multivariate normal random vector.

Suppose X is in an exponential family, that is, X has the density

${\displaystyle f_{\eta }(x)=\exp(\eta 'T(x)-\Psi (\eta ))h(x).}$

Suppose this density has support ${\displaystyle (a,b)}$ where ${\displaystyle a,b}$ could be ${\displaystyle -\infty ,\infty }$ and as ${\displaystyle x\rightarrow a{\text{ or }}b}$, ${\displaystyle \exp(\eta 'T(x))h(x)g(x)\rightarrow 0}$ where ${\displaystyle g}$ is any differentiable function such that ${\displaystyle E|g'(X)|<\infty }$ or ${\displaystyle \exp(\eta 'T(x))h(x)\rightarrow 0}$ if ${\displaystyle a,b}$ finite. Then

${\displaystyle E\left[\left({\frac {h'(X)}{h(X)}}+\sum \eta _{i}T_{i}'(X)\right)\cdot g(X)\right]=-E[g'(X)].}$

The derivation is same as the special case, namely, integration by parts.

If we only know ${\displaystyle X}$ has support ${\displaystyle \mathbb {R} }$, then it could be the case that ${\displaystyle E|g(X)|<\infty {\text{ and }}E|g'(X)|<\infty }$ but ${\displaystyle \lim _{x\rightarrow \infty }f_{\eta }(x)g(x)\not =0}$. To see this, simply put ${\displaystyle g(x)=1}$ and ${\displaystyle f_{\eta }(x)}$ with infinitely spikes towards infinity but still integrable. One such example could be adapted from ${\displaystyle f(x)={\begin{cases}1&x\in [n,n+2^{-n})\\0&{\text{otherwise}}\end{cases}}}$ so that ${\displaystyle f}$ is smooth.

Extensions to elliptically-contoured distributions also exist.[4][5][6]