Stein's unbiased risk estimate

In statistics, Stein's unbiased risk estimate (SURE) is an unbiased estimator of the mean-squared error of "a nearly arbitrary, nonlinear biased estimator."^[1] In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly.

The technique is named after its discoverer, Charles Stein.^[2]

Formal statement

Let $\mu \in {\mathbb {R} }^{d}$ be an unknown parameter and let $x\in {\mathbb {R} }^{d}$ be a measurement vector whose components are independent and distributed normally with mean $\mu$ and variance $\sigma ^{2}$ . Suppose $h(x)$ is an estimator of $\mu$ from $x$ , and can be written $h(x)=x+g(x)$ , where $g$ is weakly differentiable. Then, Stein's unbiased risk estimate is given by^[3]

\mathrm {SURE} (h)=d\sigma ^{2}+\|g(x)\|^{2}+2\sigma ^{2}\sum _{i=1}^{d}{\frac {\partial }{\partial x_{i}}}g_{i}(x),

where $g_{i}(x)$ is the $i$ th component of the function $g(x)$ , and $\|\cdot \|$ is the Euclidean norm.

The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of $h(x)$ , i.e.

E_{\mu }\{\mathrm {SURE} (h)\}=\mathrm {MSE} (h),\,\!

with

\mathrm {MSE} (h)=E_{\mu }\|h(x)-\mu \|^{2}.

Thus, minimizing SURE can act as a surrogate for minimizing the MSE. Note that there is no dependence on the unknown parameter $\mu$ in the expression for SURE above. Thus, it can be manipulated (e.g., to determine optimal estimation settings) without knowledge of $\mu$ .

Proof

We wish to show that

E_{\mu }\|h(x)-\mu \|^{2}=E_{\mu }\{\mathrm {SURE} (h)\}

.

We start by expanding the MSE as

{\begin{aligned}E_{\mu }\|h(x)-\mu \|^{2}&=E_{\mu }\|g(x)+x-\mu \|^{2}\\&=E_{\mu }\|g(x)\|^{2}+E_{\mu }\|x-\mu \|^{2}+2E_{\mu }g(x)^{T}(x-\mu )\\&=E_{\mu }\|g(x)\|^{2}+d\sigma ^{2}+2E_{\mu }g(x)^{T}(x-\mu ).\end{aligned}}

Now we use integration by parts to rewrite the last term:

{\begin{aligned}E_{\mu }g(x)^{T}(x-\mu )&=\int _{R^{d}}{\frac {1}{\sqrt {2\pi \sigma ^{2d}}}}\exp \left(-{\frac {\|x-\mu \|^{2}}{2\sigma ^{2}}}\right)\sum _{i=1}^{d}g_{i}(x)(x_{i}-\mu _{i})d^{d}x\\&=\sigma ^{2}\sum _{i=1}^{d}\int _{R^{d}}{\frac {1}{\sqrt {2\pi \sigma ^{2d}}}}\exp \left(-{\frac {\|x-\mu \|^{2}}{2\sigma ^{2}}}\right){\frac {dg_{i}}{dx_{i}}}d^{d}x\\&=\sigma ^{2}\sum _{i=1}^{d}E_{\mu }{\frac {dg_{i}}{dx_{i}}}.\end{aligned}}

Substituting this into the expression for the MSE, we arrive at

E_{\mu }\|h(x)-\mu \|^{2}=E_{\mu }\left(d\sigma ^{2}+\|g(x)\|^{2}+2\sigma ^{2}\sum _{i=1}^{d}{\frac {dg_{i}}{dx_{i}}}\right).

Applications

A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the James–Stein estimator can be derived by finding the optimal shrinkage estimator.^[2] The technique has also been used by Donoho and Johnstone to determine the optimal shrinkage factor in a wavelet denoising setting.^[1]

References

^ ^a ^b Donoho, David L.; Iain M. Johnstone (December 1995). "Adapting to Unknown Smoothness via Wavelet Shrinkage". Journal of the American Statistical Association. 90 (432). Journal of the American Statistical Association, Vol. 90, No. 432: 1200–1244. doi:10.2307/2291512. JSTOR 2291512.
^ ^a ^b Stein, Charles M. (November 1981). "Estimation of the Mean of a Multivariate Normal Distribution". The Annals of Statistics. 9 (6): 1135–1151. doi:10.1214/aos/1176345632. JSTOR 2240405.
^ Wasserman, Larry (2005). All of Nonparametric Statistics.

[donoho95-1] Donoho, David L.; Iain M. Johnstone (December 1995). "Adapting to Unknown Smoothness via Wavelet Shrinkage". Journal of the American Statistical Association. 90 (432). Journal of the American Statistical Association, Vol. 90, No. 432: 1200–1244. doi:10.2307/2291512. JSTOR 2291512.

[stein81-2] Stein, Charles M. (November 1981). "Estimation of the Mean of a Multivariate Normal Distribution". The Annals of Statistics. 9 (6): 1135–1151. doi:10.1214/aos/1176345632. JSTOR 2240405.

[wasserman05-3] Wasserman, Larry (2005). All of Nonparametric Statistics.

[1]

[2]

[3]