# Extremum estimator

In statistics and econometrics, extremum estimators is a wide class of estimators for parametric models that are calculated through maximization (or minimization) of a certain objective function, which depends on the data. The general theory of extremum estimators was developed by Amemiya (1985).

## Definition

An estimator $\scriptstyle\hat\theta$ is called an extremum estimator, if there is an objective function $\scriptstyle\hat{Q}_n$ such that

$\hat\theta = \underset{\theta\in\Theta}{\operatorname{arg\;max}}\ \widehat{Q}_n(\theta),$

where Θ is the possible range of parameter values. Sometimes a slightly weaker definition is given:

$\widehat Q_n(\hat\theta) \geq \max_{\theta\in\Theta}\,\widehat Q_n(\theta) - o_p(1),$

where op(1) is the variable converging in probability to zero. With this modification $\scriptstyle\hat\theta$ doesn’t have to be the exact maximizer of the objective function, just be sufficiently close to it.

The theory of extremum estimators does not specify what the objective function should be. There are various types of objective functions suitable for different models, and this framework allows us to analyse the theoretical properties of such estimators from a unified perspective. The theory only specifies the properties that the objective function has to possess, and when one selects a particular objective function, he or she only has to verify that those properties are satisfied.

## Consistency

When the set Θ is not compact (Θ = R in this example), then even if the objective function is uniquely maximized at θ0, this maximum may be not well-separated, in which case the estimator $\scriptscriptstyle\hat\theta$ will fail to be consistent.

If the set Θ is compact and there is a limiting function Q0(θ) such that: $\scriptstyle\hat{Q}_n(\theta)$ converges to Q0(θ) in probability uniformly over Θ, and the function Q0(θ) is continuous and has a unique maximum at θ = θ0. If these conditions are satisfied then $\scriptstyle\hat\theta$ is consistent for θ0.[1]

The uniform convergence in probability of $\scriptstyle\hat{Q}_n(\theta)$ means that

$\sup_{\theta\in\Theta} \big| \hat{Q}_n(\theta) - Q_0(\theta) \big| \ \xrightarrow{p}\ 0.$

The requirement for Θ to be compact can be replaced with a weaker assumption that the maximum of Q0 was well-separated, that is there should not exist any points θ that are distant from θ0 but such that Q0(θ) were close to Q0(θ0). Formally, it means that for any sequence {θi} such that Q0(θi) → Q0(θ0), it should be true that θiθ0.

## Examples

• Maximum likelihood estimator uses the objective function
$\hat{Q}_n(\theta) = \frac1n \sum_{i=1}^n \ln f(x_i|\theta),$
where f(·|θ) is the density function of the distribution from where the observations are drawn.
• Generalized method of moments estimator is defined through the objective function
$\hat{Q}_n(\theta) = - \Bigg(\frac1n\sum_{i=1}^n g(x_i,\theta)\Bigg)' \hat{W}_n \Bigg(\frac1n\sum_{i=1}^n g(x_i,\theta)\Bigg),$
where g(·|θ) is the moment condition of the model.
• Minimum distance estimator
• M-estimators