# Method of moments (statistics)

See method of moments (probability theory) for an account of a technique for proving convergence in distribution.

In statistics, the method of moments is a method of estimation of population parameters. One starts with deriving equations that relate the population moments (i.e., the expected values of powers of the random variable under consideration) to the parameters of interest. Then a sample is drawn and the population moments are estimated from the sample. The equations are then solved for the parameters of interest, using the sample moments in place of the (unknown) population moments. This results in estimates of those parameters. The method of moments was introduced by Karl Pearson in 1894.

## Method

Suppose that the problem is to estimate ${\displaystyle k}$ unknown parameters ${\displaystyle \theta _{1},\theta _{2},\dots ,\theta _{k}}$ characterizing the distribution ${\displaystyle f_{W}(w;\theta )}$ of the random variable ${\displaystyle W}$.[1] Suppose the first ${\displaystyle k}$ moments of the true distribution (the "population moments") can be expressed as functions of the ${\displaystyle \theta }$s:

${\displaystyle \mu _{1}\equiv E[W]=g_{1}(\theta _{1},\theta _{2},\dots ,\theta _{k}),}$
${\displaystyle \mu _{2}\equiv E[W^{2}]=g_{2}(\theta _{1},\theta _{2},\dots ,\theta _{k}),}$
${\displaystyle \vdots }$
${\displaystyle \mu _{k}\equiv E[W^{k}]=g_{k}(\theta _{1},\theta _{2},\dots ,\theta _{k}).}$

Suppose a sample of size ${\displaystyle n}$ is drawn, resulting in the values ${\displaystyle w_{1},\dots ,w_{n}}$. For ${\displaystyle j=1,\dots ,k}$, let

${\displaystyle {\hat {\mu }}_{j}={\frac {1}{n}}\sum _{i=1}^{n}w_{i}^{j}}$

be the j-th sample moment, an estimate of ${\displaystyle \mu _{j}}$. The method of moments estimator for ${\displaystyle \theta _{1},\theta _{2},\dots ,\theta _{k}}$ denoted by ${\displaystyle {\hat {\theta }}_{1},{\hat {\theta }}_{2},\dots ,{\hat {\theta }}_{k}}$ is defined as the solution (if there is one) to the equations:[citation needed]

${\displaystyle {\hat {\mu }}_{1}=g_{1}({\hat {\theta }}_{1},{\hat {\theta }}_{2},\dots ,{\hat {\theta }}_{k}),}$
${\displaystyle {\hat {\mu }}_{2}=g_{2}({\hat {\theta }}_{1},{\hat {\theta }}_{2},\dots ,{\hat {\theta }}_{k}),}$
${\displaystyle \vdots }$
${\displaystyle {\hat {\mu }}_{k}=g_{k}({\hat {\theta }}_{1},{\hat {\theta }}_{2},\dots ,{\hat {\theta }}_{k}).}$

The method of moments is fairly simple and yields consistent estimators (under very weak assumptions), though these estimators are often biased.

In some respects, when estimating parameters of a known family of probability distributions, this method was superseded by Fisher's method of maximum likelihood, because maximum likelihood estimators have higher probability of being close to the quantities to be estimated and are more often unbiased.

However, in some cases the likelihood equations may be intractable without computers, whereas the method-of-moments estimators can be quickly and easily calculated by hand.

Estimates by the method of moments may be used as the first approximation to the solutions of the likelihood equations, and successive improved approximations may then be found by the Newton–Raphson method. In this way the method of moments can assist in finding maximum likelihood estimates.

In some cases, infrequent with large samples but not so infrequent with small samples, the estimates given by the method of moments are outside of the parameter space; it does not make sense to rely on them then. That problem never arises in the method of maximum likelihood. Also, estimates by the method of moments are not necessarily sufficient statistics, i.e., they sometimes fail to take into account all relevant information in the sample.

When estimating other structural parameters (e.g., parameters of a utility function, instead of parameters of a known probability distribution), appropriate probability distributions may not be known, and moment-based estimates may be preferred to maximum likelihood estimation.