= Minimax estimator =

In statistical decision theory, a minimax estimator $\delta^M \,\!$ is an estimator which performs best in the worst possible case allowed in a problem. With problems of estimating a deterministic parameter (vector) $\theta \in \Theta$ from observations $x \in \mathcal{X},$ an estimator (estimation rule) $\delta^M \,\!$ is called minimax if its maximal risk is minimal among all estimators of $\theta \,\!$.

==Definition==
Definition : An estimator $\delta^M:\mathcal{X} \rightarrow \Theta \,\!$ is called minimax with respect to a risk function $R(\theta,\delta) \,\!$ if it achieves the smallest maximum risk among all estimators, satisfying

 $\sup_{\theta \in \Theta} R(\theta,\delta^M) = \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta). \,$

==Problem setup==
An example is the problem of estimating a deterministic (not Bayesian) parameter $\theta \in \Theta$ from noisy or corrupt data $x \in \mathcal{X}$ related through the conditional probability distribution $P(x\mid\theta)\,\!$. The goal is to find a "good" estimator $\delta(x) \,\!$ for estimating the parameter $\theta \,\!$, which minimizes some given risk function $R(\theta,\delta) \,\!$. The risk function (technically a Functional or Operator since $R$ is a function of a function, not function composition) is the expectation of some loss function $L(\theta,\delta) \,\!$ with respect to $P(x\mid\theta)\,\!$. A popular example for a loss function is the squared error loss $L(\theta,\delta)= \|\theta-\delta\|^2 \,\!$, and the risk function for this loss is the mean squared error (MSE).

In general, the risk cannot be minimized because it depends on the unknown parameter $\theta \,\!$ itself, and if the actual value of $\theta \,\!$ were known, there would be no need to estimate it. Therefore, additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criterion.

==Least favorable distribution==
Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a Bayes estimator with respect to a least favorable prior distribution of $\theta \,\!$. To demonstrate this notion denote the average risk of the Bayes estimator $\delta_\pi \,\!$ with respect to a prior distribution $\pi \,\!$ as

 $r_\pi = \int R(\theta,\delta_\pi) \, d\pi(\theta) \,$

Definition: A prior distribution $\pi \,\!$ is called least favorable if for every other distribution $\pi ' \,\!$ the average risk satisfies $r_\pi \geq r_{\pi '} \,$.

Theorem 1: If $r_\pi = \sup_\theta R(\theta,\delta_\pi), \,$ then:

1. $\delta_\pi\,\!$ is minimax.
2. If $\delta_\pi\,\!$ is a unique Bayes estimator, it is also the unique minimax estimator.
3. $\pi\,\!$ is least favorable.

Corollary: If a Bayes estimator has constant risk, it is minimax. This is not a necessary condition.

Example 1: Unfair coin: The example is the problem of estimating the "success" rate of a binomial variable, $x \sim B(n,\theta)\,\!$. This may be viewed as estimating the rate at which an unfair coin falls on "heads" or "tails". In this case the Bayes estimator with respect to a Beta-distributed prior, $\theta \sim \text{Beta}(\sqrt{n}/2,\sqrt{n}/2) \,$ is

$\delta^M=\frac{x+0.5\sqrt{n}}{n+\sqrt{n}}, \,$

with constant Bayes risk

$r=\frac{1}{4(1+\sqrt{n})^2} \,$

and, according to the Corollary, is minimax.

Definition: A sequence of prior distributions $\pi_n\,\!$ is called least favorable if for any other distribution $\pi '\,\!$,
$\lim_{n \rightarrow \infty} r_{\pi_n} \geq r_{\pi '}. \,$

Theorem 2: If there are a sequence of priors $\pi_n\,\!$ and an estimator $\delta\,\!$ such that
$\sup_\theta R(\theta,\delta)=\lim_{n \rightarrow \infty} r_{\pi_n} \,\!$, then:

1. $\delta\,\!$ is minimax.
2. The sequence $\pi_n\,\!$ is least favorable.

No uniqueness is guaranteed. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a uniform prior, $\pi_n \sim U[-n,n]\,\!$ with increasing support and also with respect to a zero-mean normal prior $\pi_n \sim N(0,n \sigma^2) \,\!$ with increasing variance. Neither the resulting ML estimator is unique minimax, nor the least favorable prior is unique.

Example 2: the problem of estimating the mean of $p\,\!$ dimensional Gaussian random vector, $x \sim N(\theta,I_p \sigma^2)\,\!$. The maximum likelihood (ML) estimator for $\theta\,\!$ in this case is $\delta_\text{ML}=x\,\!$, and its risk is

 $R(\theta,\delta_\text{ML})=E{\|\delta_{ML}-\theta\|^2}=\sum_{i=1}^p E(x_i-\theta_i)^2=p \sigma^2. \,$

The risk is constant, but the ML estimator is not a Bayes estimator, and the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence $\pi_n \sim N(0,n \sigma^2) \,\!$ and hence, minimax according to Theorem 2. Minimaxity does not always imply admissibility. In this example, the ML estimator is known to be inadmissible (not admissible) whenever $p >2\,\!$. The James–Stein estimator dominates the ML whenever $p >2\,\!$. Though both estimators have the same risk $p \sigma^2\,\!$ when $\|\theta\| \rightarrow \infty\,\!$, and they are both minimax, the James–Stein estimator has smaller risk for any finite $\|\theta\|\,\!$.

==Examples==
While in general, it is difficult, often impossible to determine the minimax estimator, in many cases, a minimax estimator has been determined.

Example 3: Bounded normal mean: When estimating the mean of a normal vector $x \sim N(\theta,I_n \sigma^2)\,\!$, where it is known that $\|\theta\|^2 \leq M\,\!$. The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding sphere is known to be minimax whenever $M \leq n\,\!$. The analytical expression for this estimator is

$\delta^M(x)=\frac{MJ_{n+1}(M\|x\|)}{\|x\|J_{n}(M\|x\|)}x, \,$

where $J_n(t)\,\!$, is the modified Bessel function of the first kind of order n.

==Asymptotic minimax estimator==
The difficulty of determining the exact minimax estimator has motivated the study of estimators of asymptotic minimax – an estimator $\delta'$ is called $c$-asymptotic (or approximate) minimax if

$\sup_{\theta\in\Theta} R(\theta,\delta')\leq c \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta).$

For many estimation problems, especially in the non-parametric estimation setting, various approximate minimax estimators have been established. The design of the approximate minimax estimator is intimately related to the geometry, such as the metric entropy number, of $\Theta$.

== Randomized minimax estimator ==

Sometimes, a minimax estimator may take the form of a randomized decision rule. The parameter space has two elements and each point on the graph corresponds to the risk of a decision rule: the x-coordinate is the risk when the parameter is $\theta_1$ and the y-coordinate is the risk when the parameter is $\theta_2$. In this decision problem, the minimax estimator lies on a line segment connecting two deterministic estimators. Choosing $\delta_1$ with probability $1 - p$ and $\delta_2$ with probability $p$ minimises the supremum risk.

==Relationship to robust optimization==
Robust optimization is an approach to solve optimization problems under uncertainty in the knowledge of underlying parameters. For instance, the MMSE Bayesian estimation of a parameter requires the knowledge of parameter correlation function. If the knowledge of this correlation function is not perfectly available, a popular minimax robust optimization approach is to define a set characterizing the uncertainty about the correlation function, and then pursuing a minimax optimization over the uncertainty set and the estimator respectively. Similar minimax optimizations can be pursued to make estimators robust to certain imprecisely known parameters. For instance, a recent study dealing with such techniques in the area of signal processing can be found in.

In R. Fandom Noubiap and W. Seidel (2001) an algorithm for calculating a Gamma-minimax decision rule has been developed, when Gamma is given by a finite number of generalized moment conditions. Such a decision rule minimizes the maximum of the integrals of the risk function with respect to all distributions in Gamma. Gamma-minimax decision rules are of interest in robustness studies in Bayesian statistics.
