Stein's lemma

From Wikipedia, the free encyclopedia

Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory.[1] The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

Note that the name "Stein's lemma" is also commonly used[2] to refer to a different result in the area of statistical hypothesis testing, which connects the error exponents in hypothesis testing with the Kullback–Leibler divergence. This result is also known as the Chernoff–Stein lemma[3] and is not related to the lemma discussed in this article.

Statement of the lemma[edit]

Suppose X is a normally distributed random variable with expectation μ and variance σ2. Further suppose g is a differentiable function for which the two expectations E(g(X) (X − μ)) and E(g ′(X)) both exist. (The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its absolute value.) Then

In general, suppose X and Y are jointly normally distributed. Then

For a general multivariate Gaussian random vector it follows that


The univariate probability density function for the univariate normal distribution with expectation 0 and variance 1 is

Since we get from integration by parts:


The case of general variance follows by substitution.

More general statement[edit]

Isserlis' theorem is equivalently stated as

where is a zero-mean multivariate normal random vector.

Suppose X is in an exponential family, that is, X has the density

Suppose this density has support where could be and as , where is any differentiable function such that or if finite. Then

The derivation is same as the special case, namely, integration by parts.

If we only know has support , then it could be the case that but . To see this, simply put and with infinitely spikes towards infinity but still integrable. One such example could be adapted from so that is smooth.

Extensions to elliptically-contoured distributions also exist.[4][5][6]

See also[edit]


  1. ^ Ingersoll, J., Theory of Financial Decision Making, Rowman and Littlefield, 1987: 13-14.
  2. ^ Csiszár, Imre; Körner, János (2011). Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press. p. 14. ISBN 9781139499989.
  3. ^ Thomas M. Cover, Joy A. Thomas (2006). Elements of Information Theory. John Wiley & Sons, New York. ISBN 9781118585771.
  4. ^ Cellier, Dominique; Fourdrinier, Dominique; Robert, Christian (1989). "Robust shrinkage estimators of the location parameter for elliptically symmetric distributions". Journal of Multivariate Analysis. 29 (1): 39–52. doi:10.1016/0047-259X(89)90075-4.
  5. ^ Hamada, Mahmoud; Valdez, Emiliano A. (2008). "CAPM and option pricing with elliptically contoured distributions". The Journal of Risk & Insurance. 75 (2): 387–409. CiteSeerX doi:10.1111/j.1539-6975.2008.00265.x.
  6. ^ Landsman, Zinoviy; Nešlehová, Johanna (2008). "Stein's Lemma for elliptical random vectors". Journal of Multivariate Analysis. 99 (5): 912––927. doi:10.1016/j.jmva.2007.05.006.