# Minimax estimator

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) $\theta \in \Theta$ from observations $x \in \mathcal{X},$ an estimator (estimation rule) $\delta^M \,\!$ is called minimax if its maximal risk is minimal among all estimators of $\theta \,\!$. In a sense this means that $\delta^M \,\!$ is an estimator which performs best in the worst possible case allowed in the problem.

## Problem setup

Consider the problem of estimating a deterministic (not Bayesian) parameter $\theta \in \Theta$ from noisy or corrupt data $x \in \mathcal{X}$ related through the conditional probability distribution $P(x|\theta)\,\!$. Our goal is to find a "good" estimator $\delta(x) \,\!$ for estimating the parameter $\theta \,\!$, which minimizes some given risk function $R(\theta,\delta) \,\!$. Here the risk function is the expectation of some loss function $L(\theta,\delta) \,\!$ with respect to $P(x|\theta)\,\!$. A popular example for a loss function is the squared error loss $L(\theta,\delta)= \|\theta-\delta\|^2 \,\!$, and the risk function for this loss is the mean squared error (MSE).

Unfortunately in general the risk cannot be minimized, since it depends on the unknown parameter $\theta \,\!$ itself (If we knew what was the actual value of $\theta \,\!$, we wouldn't need to estimate it). Therefore additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criteria.

## Definition

Definition : An estimator $\delta^M:\mathcal{X} \rightarrow \Theta \,\!$ is called minimax with respect to a risk function $R(\theta,\delta) \,\!$ if it achieves the smallest maximum risk among all estimators, meaning it satisfies

$\sup_{\theta \in \Theta} R(\theta,\delta^M) = \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta). \,$

## Least favorable distribution

Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a Bayes estimator with respect to a prior least favorable distribution of $\theta \,\!$. To demonstrate this notion denote the average risk of the Bayes estimator $\delta_{\pi} \,\!$ with respect to a prior distribution $\pi \,\!$ as

$r_\pi = \int R(\theta,\delta_{\pi}) \, d\pi(\theta) \,$

Definition: A prior distribution $\pi \,\!$ is called least favorable if for any other distribution $\pi ' \,\!$ the average risk satisfies $r_\pi \geq r_{\pi '} \,$.

Theorem 1: If $r_\pi = \sup_\theta R(\theta,\delta_\pi), \,$ then:

1. $\delta_{\pi}\,\!$ is minimax.
2. If $\delta_{\pi}\,\!$ is a unique Bayes estimator, it is also the unique minimax estimator.
3. $\pi\,\!$ is least favorable.

Corollary: If a Bayes estimator has constant risk, it is minimax. Note that this is not a necessary condition.

Example 1, Unfair coin: Consider the problem of estimating the "success" rate of a Binomial variable, $x \sim B(n,\theta)\,\!$. This may be viewed as estimating the rate at which an unfair coin falls on "heads" or "tails". In this case the Bayes estimator with respect to a Beta-distributed prior, $\theta \sim \text{Beta}(\sqrt{n}/2,\sqrt{n}/2) \,$ is

$\delta^M=\frac{x+0.5\sqrt{n}}{n+\sqrt{n}}, \,$

with constant Bayes risk

$r=\frac{1}{4(1+\sqrt{n})^2} \,$

and, according to the Corollary, is minimax.

Definition: A sequence of prior distributions ${\pi}_n\,\!$ is called least favorable if for any other distribution $\pi '\,\!$,

$\lim_{n \rightarrow \infty} r_{\pi_n} \geq r_{\pi '}. \,$

Theorem 2: If there are a sequence of priors $\pi_n\,\!$ and an estimator $\delta\,\!$ such that $\sup_{\theta} R(\theta,\delta)=\lim_{n \rightarrow \infty} r_{\pi_n} \,\!$, then :

1. $\delta\,\!$ is minimax.
2. The sequence ${\pi}_n\,\!$ is least favorable.

Notice that no uniqueness is guaranteed here. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a uniform prior, $\pi_n \sim U[-n,n]\,\!$ with increasing support and also with respect to a zero mean normal prior $\pi_n \sim N(0,n \sigma^2) \,\!$ with increasing variance. So neither the resulting ML estimator is unique minimax nor the least favorable prior is unique.

Example 2: Consider the problem of estimating the mean of $p\,\!$ dimensional Gaussian random vector, $x \sim N(\theta,I_p \sigma^2)\,\!$. The Maximum likelihood (ML) estimator for $\theta\,\!$ in this case is simply $\delta_{ML}=x\,\!$, and its risk is

$R(\theta,\delta_{ML})=E{\|\delta_{ML}-\theta\|^2}=\sum \limits_1^p E{(x_i-\theta_i)^2}=p \sigma^2. \,$
MSE of maximum likelihood estimator versus James–Stein estimator

The risk is constant, but the ML estimator is actually not a Bayes estimator, so the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence $\pi_n \sim N(0,n \sigma^2) \,\!$, and, hence, indeed minimax according to Theorem 2 . Nonetheless, minimaxity does not always imply admissibility. In fact in this example, the ML estimator is known to be inadmissible (not admissible) whenever $p >2\,\!$. The famous James–Stein estimator dominates the ML whenever $p >2\,\!$. Though both estimators have the same risk $p \sigma^2\,\!$ when $\|\theta\| \rightarrow \infty\,\!$, and they are both minimax, the James–Stein estimator has smaller risk for any finite $\|\theta\|\,\!$. This fact is illustrated in the following figure.

## Some examples

In general it is difficult, often even impossible to determine the minimax estimator. Nonetheless, in many cases a minimax estimator has been determined.

Example 3, Bounded Normal Mean: When estimating the Mean of a Normal Vector $x \sim N(\theta,I_n \sigma^2)\,\!$, where it is known that $\|\theta\|^2 \leq M\,\!$. The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding sphere is known to be minimax whenever $M \leq n\,\!$. The analytical expression for this estimator is

$\delta^M=\frac{nJ_{n+1}(n\|x\|)}{\|x\|J_{n}(n\|x\|)}, \,$

where $J_{n}(t)\,\!$, is the modified Bessel function of the first kind of order n.

## Asymptotic minimax estimator

The difficulty of determining the exact minimax estimator has motivated the study of estimators of asymptotic minimax --- an estimator $\delta'$ is called $c$-asymptotic (or approximate) minimax if

$\sup_{\theta\in\Theta} R(\theta,\delta')\leq c \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta).$

For many estimation problems, especially in the non-parametric estimation setting, various approximate minimax estimators have been established. The design of approximate minimax estimator is intimately related to the geometry, such as the metric entropy number, of $\Theta$.

## Relationship to Robust Optimization

Robust optimization is an approach to solve optimization problems under uncertainty in the knowledge of underlying parameters,.[1][2] For instance, the MMSE Bayesian estimation of a parameter requires the knowledge of parameter correlation function. If the knowledge of this correlation function is not perfectly available, a popular minimax robust optimization approach[3] is to define a set characterizing the uncertainty about the correlation function, and then pursuing a minimax optimization over the uncertainty set and the estimator respectively. Similar minimax optimizations can be pursued to make estimators robust to certain imprecisely known parameters. For instance, a recent study dealing with such techniques in the area of signal processing can be found in.[4]

In R. Fandom Noubiap and W. Seidel (2001) an algorithm for calculating a Gamma-minimax decision rule has been developed, when Gamma is given by a finite number of generalized moment conditions. Such a decision rule minimizes the maximum of the integrals of the risk function with respect to all distributions in Gamma. Gamma-minimax decision rules are of interest in robustness studies in Bayesian statistics.

## References

• E. L. Lehmann and G. Casella (1998), Theory of Point Estimation, 2nd ed. New York: Springer-Verlag.
• F. Perron and E. Marchand (2002), "On the minimax estimator of a bounded normal mean," Statistics and Probability Letters 58: 327–333.
• J. O. Berger (1985), Statistical Decision Theory and Bayesian Analysis, 2nd ed. New York: Springer-Verlag. ISBN 0-387-96098-8.
• R. Fandom Noubiap and W. Seidel (2001), An Algorithm for Calculating Gamma-Minimax Decision Rules under Generalized Moment Conditions, Annals of Statistics, Aug., 2001, vol. 29, no. 4, p. 1094–1116
• Stein, C. (1981). "Estimation of the mean of a multivariate normal distribution". Annals of Statistics 9 (6): 1135–1151. doi:10.1214/aos/1176345632. MR 630098. Zbl 0476.62035.
1. ^ S. A. Kassam and H. V. Poor (1985), "Robust Techniques for Signal Processing: A Survey," Proceedings of the IEEE, vol. 73, pp. 433–481, March 1985.
2. ^ A. Ben-Tal, L. El Ghaoui, and A. Nemirovski (2009), "Robust Optimization", Princeton University Press, 2009.
3. ^ S. Verdu and H. V. Poor (1984), "On Minimax Robustness: A general approach and applications," IEEE Transactions on Information Theory, vol. 30, pp. 328–340, March 1984.
4. ^ M. Danish Nisar. "Minimax Robustness in Signal Processing for Communications", Shaker Verlag, ISBN 978-3-8440-0332-1, August 2011.