# Minimax estimator

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) ${\displaystyle \theta \in \Theta }$ from observations ${\displaystyle x\in {\mathcal {X}},}$ an estimator (estimation rule) ${\displaystyle \delta ^{M}\,\!}$ is called minimax if its maximal risk is minimal among all estimators of ${\displaystyle \theta \,\!}$. In a sense this means that ${\displaystyle \delta ^{M}\,\!}$ is an estimator which performs best in the worst possible case allowed in the problem.

## Problem setup

Consider the problem of estimating a deterministic (not Bayesian) parameter ${\displaystyle \theta \in \Theta }$ from noisy or corrupt data ${\displaystyle x\in {\mathcal {X}}}$ related through the conditional probability distribution ${\displaystyle P(x|\theta )\,\!}$. Our goal is to find a "good" estimator ${\displaystyle \delta (x)\,\!}$ for estimating the parameter ${\displaystyle \theta \,\!}$, which minimizes some given risk function ${\displaystyle R(\theta ,\delta )\,\!}$. Here the risk function is the expectation of some loss function ${\displaystyle L(\theta ,\delta )\,\!}$ with respect to ${\displaystyle P(x|\theta )\,\!}$. A popular example for a loss function is the squared error loss ${\displaystyle L(\theta ,\delta )=\|\theta -\delta \|^{2}\,\!}$, and the risk function for this loss is the mean squared error (MSE).

Unfortunately in general the risk cannot be minimized, since it depends on the unknown parameter ${\displaystyle \theta \,\!}$ itself (If we knew what was the actual value of ${\displaystyle \theta \,\!}$, we wouldn't need to estimate it). Therefore additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criteria.

## Definition

Definition : An estimator ${\displaystyle \delta ^{M}:{\mathcal {X}}\rightarrow \Theta \,\!}$ is called minimax with respect to a risk function ${\displaystyle R(\theta ,\delta )\,\!}$ if it achieves the smallest maximum risk among all estimators, meaning it satisfies

${\displaystyle \sup _{\theta \in \Theta }R(\theta ,\delta ^{M})=\inf _{\delta }\sup _{\theta \in \Theta }R(\theta ,\delta ).\,}$

## Least favorable distribution

Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a Bayes estimator with respect to a prior least favorable distribution of ${\displaystyle \theta \,\!}$. To demonstrate this notion denote the average risk of the Bayes estimator ${\displaystyle \delta _{\pi }\,\!}$ with respect to a prior distribution ${\displaystyle \pi \,\!}$ as

${\displaystyle r_{\pi }=\int R(\theta ,\delta _{\pi })\,d\pi (\theta )\,}$

Definition: A prior distribution ${\displaystyle \pi \,\!}$ is called least favorable if for any other distribution ${\displaystyle \pi '\,\!}$ the average risk satisfies ${\displaystyle r_{\pi }\geq r_{\pi '}\,}$.

Theorem 1: If ${\displaystyle r_{\pi }=\sup _{\theta }R(\theta ,\delta _{\pi }),\,}$ then:

1. ${\displaystyle \delta _{\pi }\,\!}$ is minimax.
2. If ${\displaystyle \delta _{\pi }\,\!}$ is a unique Bayes estimator, it is also the unique minimax estimator.
3. ${\displaystyle \pi \,\!}$ is least favorable.

Corollary: If a Bayes estimator has constant risk, it is minimax. Note that this is not a necessary condition.

Example 1, Unfair coin: Consider the problem of estimating the "success" rate of a Binomial variable, ${\displaystyle x\sim B(n,\theta )\,\!}$. This may be viewed as estimating the rate at which an unfair coin falls on "heads" or "tails". In this case the Bayes estimator with respect to a Beta-distributed prior, ${\displaystyle \theta \sim {\text{Beta}}({\sqrt {n}}/2,{\sqrt {n}}/2)\,}$ is

${\displaystyle \delta ^{M}={\frac {x+0.5{\sqrt {n}}}{n+{\sqrt {n}}}},\,}$

with constant Bayes risk

${\displaystyle r={\frac {1}{4(1+{\sqrt {n}})^{2}}}\,}$

and, according to the Corollary, is minimax.

Definition: A sequence of prior distributions ${\displaystyle {\pi }_{n}\,\!}$ is called least favorable if for any other distribution ${\displaystyle \pi '\,\!}$,

${\displaystyle \lim _{n\rightarrow \infty }r_{\pi _{n}}\geq r_{\pi '}.\,}$

Theorem 2: If there are a sequence of priors ${\displaystyle \pi _{n}\,\!}$ and an estimator ${\displaystyle \delta \,\!}$ such that ${\displaystyle \sup _{\theta }R(\theta ,\delta )=\lim _{n\rightarrow \infty }r_{\pi _{n}}\,\!}$, then :

1. ${\displaystyle \delta \,\!}$ is minimax.
2. The sequence ${\displaystyle {\pi }_{n}\,\!}$ is least favorable.

Notice that no uniqueness is guaranteed here. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a uniform prior, ${\displaystyle \pi _{n}\sim U[-n,n]\,\!}$ with increasing support and also with respect to a zero mean normal prior ${\displaystyle \pi _{n}\sim N(0,n\sigma ^{2})\,\!}$ with increasing variance. So neither the resulting ML estimator is unique minimax nor the least favorable prior is unique.

Example 2: Consider the problem of estimating the mean of ${\displaystyle p\,\!}$ dimensional Gaussian random vector, ${\displaystyle x\sim N(\theta ,I_{p}\sigma ^{2})\,\!}$. The Maximum likelihood (ML) estimator for ${\displaystyle \theta \,\!}$ in this case is simply ${\displaystyle \delta _{ML}=x\,\!}$, and its risk is

${\displaystyle R(\theta ,\delta _{ML})=E{\|\delta _{ML}-\theta \|^{2}}=\sum \limits _{1}^{p}E{(x_{i}-\theta _{i})^{2}}=p\sigma ^{2}.\,}$
MSE of maximum likelihood estimator versus James–Stein estimator

The risk is constant, but the ML estimator is actually not a Bayes estimator, so the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence ${\displaystyle \pi _{n}\sim N(0,n\sigma ^{2})\,\!}$, and, hence, indeed minimax according to Theorem 2 . Nonetheless, minimaxity does not always imply admissibility. In fact in this example, the ML estimator is known to be inadmissible (not admissible) whenever ${\displaystyle p>2\,\!}$. The famous James–Stein estimator dominates the ML whenever ${\displaystyle p>2\,\!}$. Though both estimators have the same risk ${\displaystyle p\sigma ^{2}\,\!}$ when ${\displaystyle \|\theta \|\rightarrow \infty \,\!}$, and they are both minimax, the James–Stein estimator has smaller risk for any finite ${\displaystyle \|\theta \|\,\!}$. This fact is illustrated in the following figure.

## Some examples

In general it is difficult, often even impossible to determine the minimax estimator. Nonetheless, in many cases a minimax estimator has been determined.

Example 3, Bounded Normal Mean: When estimating the Mean of a Normal Vector ${\displaystyle x\sim N(\theta ,I_{n}\sigma ^{2})\,\!}$, where it is known that ${\displaystyle \|\theta \|^{2}\leq M\,\!}$. The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding sphere is known to be minimax whenever ${\displaystyle M\leq n\,\!}$. The analytical expression for this estimator is

${\displaystyle \delta ^{M}={\frac {nJ_{n+1}(n\|x\|)}{\|x\|J_{n}(n\|x\|)}},\,}$

where ${\displaystyle J_{n}(t)\,\!}$, is the modified Bessel function of the first kind of order n.

## Asymptotic minimax estimator

The difficulty of determining the exact minimax estimator has motivated the study of estimators of asymptotic minimax --- an estimator ${\displaystyle \delta '}$ is called ${\displaystyle c}$-asymptotic (or approximate) minimax if

${\displaystyle \sup _{\theta \in \Theta }R(\theta ,\delta ')\leq c\inf _{\delta }\sup _{\theta \in \Theta }R(\theta ,\delta ).}$

For many estimation problems, especially in the non-parametric estimation setting, various approximate minimax estimators have been established. The design of approximate minimax estimator is intimately related to the geometry, such as the metric entropy number, of ${\displaystyle \Theta }$.

## Randomised minimax estimator

Sometimes, a minimax estimator may take the form of a randomised decision rule. An example is shown on the left. The parameter space has just two elements and each point on the graph corresponds to the risk of a decision rule: the x-coordinate is the risk when the parameter is ${\displaystyle \theta _{1}}$ and the y-coordinate is the risk when the parameter is ${\displaystyle \theta _{2}}$. In this decision problem, the minimax estimator lies on a line segment connecting two deterministic estimators. Choosing ${\displaystyle \delta _{1}}$ with probability ${\displaystyle 1-p}$ and ${\displaystyle \delta _{2}}$ with probability ${\displaystyle p}$ minimises the supremum risk.

## Relationship to Robust Optimization

Robust optimization is an approach to solve optimization problems under uncertainty in the knowledge of underlying parameters,.[1][2] For instance, the MMSE Bayesian estimation of a parameter requires the knowledge of parameter correlation function. If the knowledge of this correlation function is not perfectly available, a popular minimax robust optimization approach[3] is to define a set characterizing the uncertainty about the correlation function, and then pursuing a minimax optimization over the uncertainty set and the estimator respectively. Similar minimax optimizations can be pursued to make estimators robust to certain imprecisely known parameters. For instance, a recent study dealing with such techniques in the area of signal processing can be found in.[4]

In R. Fandom Noubiap and W. Seidel (2001) an algorithm for calculating a Gamma-minimax decision rule has been developed, when Gamma is given by a finite number of generalized moment conditions. Such a decision rule minimizes the maximum of the integrals of the risk function with respect to all distributions in Gamma. Gamma-minimax decision rules are of interest in robustness studies in Bayesian statistics.

## References

• E. L. Lehmann and G. Casella (1998), Theory of Point Estimation, 2nd ed. New York: Springer-Verlag.
• F. Perron and E. Marchand (2002), "On the minimax estimator of a bounded normal mean," Statistics and Probability Letters 58: 327–333.
• J. O. Berger (1985), Statistical Decision Theory and Bayesian Analysis, 2nd ed. New York: Springer-Verlag. ISBN 0-387-96098-8.
• R. Fandom Noubiap and W. Seidel (2001), An Algorithm for Calculating Gamma-Minimax Decision Rules under Generalized Moment Conditions, Annals of Statistics, Aug., 2001, vol. 29, no. 4, p. 1094–1116
• Stein, C. (1981). "Estimation of the mean of a multivariate normal distribution". Annals of Statistics. 9 (6): 1135–1151. doi:10.1214/aos/1176345632. MR 0630098. Zbl 0476.62035.
1. ^ S. A. Kassam and H. V. Poor (1985), "Robust Techniques for Signal Processing: A Survey," Proceedings of the IEEE, vol. 73, pp. 433–481, March 1985.
2. ^ A. Ben-Tal, L. El Ghaoui, and A. Nemirovski (2009), "Robust Optimization", Princeton University Press, 2009.
3. ^ S. Verdu and H. V. Poor (1984), "On Minimax Robustness: A general approach and applications," IEEE Transactions on Information Theory, vol. 30, pp. 328–340, March 1984.
4. ^ M. Danish Nisar. "Minimax Robustness in Signal Processing for Communications", Shaker Verlag, ISBN 978-3-8440-0332-1, August 2011.