In statistics, a shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the effects of shrinkage. In loose terms this means that a naïve or raw estimate is improved by combining it with other information. The term relates to the notion that the improved estimate is made closer to the value supplied by the 'other information' than the raw estimate. In this sense, shrinkage is used to regularize ill-posed inference problems.
One general result is that many standard estimators can be improved, in terms of mean squared error (MSE), by shrinking them towards zero (or any other fixed constant value). Assume that the expected value of the raw estimate is not zero and consider other estimators obtained by multiplying the raw estimate by a certain parameter. A value for this parameter can be specified as that minimising the MSE of the new estimate. For this value of the parameter, the new estimate will have a smaller MSE than the raw one. Thus it has been improved. An effect here may be to convert an unbiased raw estimate to an improved biased one. A well-known example arises in the estimation of the population variance based on a simple sample; for a sample size of n, the use of a divisor n − 1 in the usual formula gives an unbiased estimator while a divisor of n + 1 gives one which has the minimum mean square error.
Shrinkage is implicit in Bayesian inference and penalized likelihood inference, and explicit in James–Stein-type inference. In contrast, simple types of maximum-likelihood and least-squares estimation procedures do not include shrinkage effects, although they can be used within shrinkage estimation schemes.
The use of shrinkage estimators in the context of regression analysis, where there may be a large number of explanatory variables, has been described by Copas. Here the values of the estimated regression coefficients are shrunken towards zero with the effect of reducing the mean square error of predicted values from the model when applied to new data. A later paper by Copas applies shrinkage in a context where the problem is to predict a binary response on the basis of binary explanatory variables.
Hausser and Stimmmer "develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, ...it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. ...method is fully analytic and hence computationally inexpensive. Moreover, ...procedure simultaneously provides estimates of the entropy and of the cell frequencies. ...The proposed shrinkage estimators of entropy and mutual information, as well as all other investigated entropy estimators, have been implemented in R (R Development Core Team, 2008). A corresponding R package “entropy” was deposited in the R archive CRAN and is accessible at the URL http://cran.r-project.org/web/packages/entropy/ under the GNU General Public License." 
See also 
- Stein's example
- Shrinkage estimation in Estimation of covariance matrices
- Regularization (mathematics)
- Tikhonov regularization
- Copas, J.B. (1983). "Regression, Prediction and Shrinkage". Journal of the Royal Statistical Society, Series B 45 (3): 311–354. JSTOR 2345402. MR 737642.
- Copas, J.B. (1993). "The shrinkage of point scoring methods". Journal of the Royal Statistical Society, Series C 42 (2): 315–331. JSTOR 2986235.
- Hausser, Jean; Stimmer (2009). "Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks". Journal of Machine Learning Research 10: 1469–1484. Retrieved 2013-03-23.
Statistical Software 
Hausser, Jean. "entropy". entropy package for R. Retrieved 2013-03-23.