Maximum spacing estimation

Maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a statistical method for fitting the parameters of a mathematical model to data Template:Ref harvard. The concept underlying the method is based on the probability integral transform, in that a set of independent random samples derived from any random variable should on average be uniformly distributed with respect to the cumulative distribution function of the random variable. The MPS method chooses the parameter values that make the observed data as uniform as possible, according to a specific quantitative measure of uniformity.

One of the most common methods for estimating the parameters of a distribution from data, the method of maximum likelihood (ML), can break down in various cases, such as certain mixtures of continuous distributions or heavy-tailed continuous distributions where the location or scale parameters are unknown. Maximum spacing estimation is a method, consistent with Kullback-Leibler information theory, that can be used to estimate parameters in these situations Template:Ref harvard. The method was independently derived by Russel Cheng and Nik Amin, then at the University of Wales Institute of Science and Technology, and Bo Ranneby, then at the Swedish University of Agricultural Sciences Template:Ref harvard.

The method has been also discussed in the literature of various applied sciences such as hydrology Template:Ref harvard and econometrics Template:Ref harvard, together with various mathematical and statistical journals.

Definition

Theory

There have been multiple explanations given for the use of maximum spacing methods. Ranneby Template:Ref harvard justifies the method by demonstrating that it is an estimator of the Kullback–Leibler divergence, similar to maximum likelihood estimation, but with more robust properties for various classes of problems. Cheng & Amin Template:Ref harvard explain that due to the probability integral transform at the true paramater, the "spacing" between each observation should be uniformly distributed. This would imply that the difference between the values of the cumulative distribution function at consecutive observations should be equal. This is the case that maximizes the geometric mean, so solving for the parameters that maximize the geometric mean would achieve the "best" fit as defined this way.

Formal definition

Given an independent and identically distributed (iid) random sample of size $n$ from a statistical population with a univariate distribution, let the cumulative distribution function be:

F(x;\theta ^{0})\colon \theta ^{0}\in \Theta ,\Theta \subseteq \mathbb {R} ^{k}(k\geq 1)

Let $\textstyle X_{1},\ldots ,X_{n}$ be the ordered sample, i.e. take the sample and place the observations in size order from smallest to largest.

Let $\scriptstyle \theta \in \Theta$ be an estimator for $\theta ^{0}$ . Then $F(x;\theta )$ is an estimator for $F(x;\theta ^{0})$ .

Pyke Template:Ref harvard^{[note 1]} defines the first-order spacing as:

D_{i}(\theta )=F(X_{i};\theta )-F(X_{i-1};\theta ),\quad D_{1}(\theta )=F(X_{1};\theta ),\quad D_{n+1}(\theta )=1-F(X_{n};\theta )

This may be thought of as the "spacing" between the values of the distribution function at adjacent order statistics, starting at 0 and ending at 1, as $D_{1}=F(X_{1};\theta )-0$ .

Let $G_{n}(\theta )$ be the statistic defined as the geometric mean of the first-order spacings of the ordered sample, and let $S_{n}(\theta )$ be the natural logarithm of $G_{n}(\theta )$ .

{\begin{aligned}G_{n}(\theta )&=\left\{\prod _{i=1}^{n+1}{D_{i}}(\theta )\right\}^{\frac {1}{n+1}}\\S_{n}(\theta )&=\ln {G_{n}(\theta )}\\S_{n}(\theta )&={\frac {1}{n+1}}\sum _{i=1}^{n+1}\ln {D_{i}}(\theta )\\{\hat {\theta }}&={\underset {\theta \in \Theta }{\operatorname {arg\,max} }}\;S_{n}(\theta )\end{aligned}}

In other words, if there exists a $\scriptstyle {\hat {\theta }}\in \Theta$ that maximizes $S_{n}(\theta )$ , $\scriptstyle {\hat {\theta }}$ is the maximum spacing estimator of $\theta ^{0}$ .

In practice, optimization is usually performed by minimizing $\scriptstyle -\!S_{n}(\theta )$ , similar to maximum likelihood procedures which usually minimize the negative logliklihood.

Ranneby Template:Ref harvard defines $S_{n}(\theta )$ somewhat differently as:

S_{n}(\theta )={\frac {1}{n+1}}\sum _{i=1}^{n+1}\ln {\left\{(n+1)\cdot D_{i}(\theta )\right\}}

The maximum spacing estimate under this statistic is identical to that under Cheng & Amin's original definition, as the $(n+1)$ term is constant with respect to $\theta$ . Similarly, Cheng & Stephens Template:Ref harvard, when discussing goodness of fit, use Moran's statistic which is the Cheng & Amin definition multiplied by $\scriptstyle -\!(n+1)$ , and minimization instead of maximization is used to find the estimate.

Properties

Consistency

The maximum spacing estimator is a consistent estimator in that it converges in probability to the true value of the parameter, $\theta ^{0}$ , as the sample size increases to infinity Template:Ref harvard. The consistency of maximum spacing estimation holds under much more general conditions than for maximum likelihood estimators Template:Ref harvard.

Efficiency

Maximum spacing estimators are at least as asymptotically efficient estimators as maximum likelihood estimators, where the latter exist, and MSEs may exist in cases where MLEs do not Template:Ref harvard.

Sensitivity

Maximum spacing estimators are sensitive to closely spaced observations, and especially ties Template:Ref harvard. Given:

X_{i+k}=X_{i+k-1}=\ldots =X_{i}

we get

D_{i+k}(\theta )=D_{i+k-1}(\theta )=\ldots =D_{i+1}(\theta )=0

When the ties are due to multiple observations, Cheng & Amin Template:Ref harvard show that in this case the repeated spacings (those that would otherwise be zero) should be replaced by the corresponding likelihood, namely substitute $f_{i}(\theta )$ for $D_{i}(\theta )$ , as:

\lim _{i\to i-1}\int _{x_{i-1}}^{x_{i}}f(t;\theta )\;dt=f(x_{i-1},\theta )=f(x_{i},\theta )

since $x_{i}=x_{i-1}$ .

When ties are due to rounding error, Cheng & Stephens Template:Ref harvard suggest another method to remove the effects.^{[note 2]} Given $r$ tied observations from $x_{i}$ to $x_{i+r-1}$ , let $\delta$ represent the round-off error. All of the true values should then fall in the range $\scriptstyle x\pm \delta$ . The corresponding points on the distribution should now fall between $\scriptstyle y_{L}=F(x-\delta ,{\hat {\theta }})$ and $\scriptstyle y_{U}=F(x+\delta ,{\hat {\theta }})$ .

Set:

D_{j}={\frac {y_{U}-y_{L}}{r-1}}\quad (j=i+1,\ldots ,i+r-1)

This is the equivalent to assuming that that the rounded values are uniformly spaced in the interval.

The method is also sensitive to secondary clustering Template:Ref harvard. For example, when a set of observations is thought to come from a single normal distribution but in fact comes from a mixture normals with different means. Or if the data is thought to come from an exponential distribution when it actually comes from a gamma distribution. In the latter case, smaller spacings may occur in the lower tail. A high value of $M(\theta )$ would indicate this secondary clustering effect, and should suggest a closer look at the data Template:Ref harvard.

Goodness of fit

The statistic $S_{n}(\theta )$ is also a form of Moran or Moran-Darling statistic, $M(\theta )$ , which can be used to test goodness of fit.^{[note 3]} It has been shown that the statistic, when defined as:

S_{n}(\theta )=M_{n}(\theta )=-\sum _{j=1}^{n+1}\ln {D_{j}(\theta )}

is asymptotically normal and a chi-squared approximation exists for small samples Template:Ref harvard. In the case where we know $\theta ^{0}$ , the true parameter, it has a normal distribution with:

{\begin{aligned}\mu _{M}&\approx (n+1)(\ln(n+1)+\gamma )-{\frac {1}{2}}-{\frac {1}{12(n+1)}}\\\sigma _{M}^{2}&\approx (n+1)\left({\frac {\pi ^{2}}{6}}-1\right)-{\frac {1}{2}}-{\frac {1}{6(n+1)}}\end{aligned}}

Where $\gamma$ is the Euler–Mascheroni constant which is approximately 0.57722 Template:Ref harvard.^{[note 4]}

This distribution can be approximated by $A$ where:

{\begin{aligned}C_{1}&=\mu _{M}-{\sqrt {\frac {\sigma _{M}^{2}n}{2}}}\\C_{2}&={\sqrt {\frac {\sigma _{M}^{2}}{2n}}}\\A&=C_{1}+C_{2}\chi _{n}^{2}\end{aligned}}

Where $\scriptstyle \chi _{n}^{2}$ follows a chi-square distribution with $n$ degrees of freedom.

Therefore to test $H_{0}$ , that a random sample of $n$ values comes from the distribution $F(x,\theta )$ the statistic $\scriptstyle T(\theta )={\frac {M(\theta )-C_{1}}{C_{2}}}$ can be calculated, and $H_{0}$ should be rejected with significance $\alpha$ if the value is greater than the critical value of the appropriate chi-square distribution Template:Ref harvard.

Where $\theta ^{0}$ is being estimated by $\scriptstyle {\hat {\theta }}$ , Cheng & Stephens Template:Ref harvard showed that $\scriptstyle S_{n}({\hat {\theta }})=M_{n}({\hat {\theta }})$ has the same asymptotic mean and variance as in the known case. However, the test statistic to be used is:

T({\hat {\theta }})={\frac {M({\hat {\theta }})+{\frac {k}{2}}-C_{1}}{C_{2}}}

where $k$ is the number of parameters in the estimate $\scriptstyle {\hat {\theta }}$ .

Generalized maximum spacing

Alternate measures and spacings

Ranneby and Ekström Template:Ref harvard generalized the method to approximate other measures besides Kullback-Leibler. Ekström Template:Ref harvard further expanded the method to investigate properties of estimators using higher order spacings, where an m-order spacing would be defined as $F(X_{j+m})-F(X_{j})$ .

Multivariate distributions

Ranneby et al. Template:Ref harvard discuss extended maximum spacing methods to the multivariate case. As there is no natural order for $\mathbb {R} ^{k}(k>1)$ , they discuss two alternate approahes: a geometric approach based on Dirichlet cells and a probabilistic approach based on a "nearest neighbor ball" metric.

Notes

^ The actual definition is sourced to Template:Ref harvard, but without direct access to that paper, sourcing is given to Pyke's later paper which defines the spacings in passing. -- Editor
^ There appear to be some minor typographical errors in the paper. For example, in section 4.2, equation (4.1), the rounding replacement for $D_{j}$ , should not have the log term. In section 1, equation (1.2), $D_{j}$ is defined to be the spacing itself, and $M(\theta )$ is the negative sum of the logs of $D_{j}$ . If $D_{j}$ is logged at this step, the result is always =<0, as the difference between two adjacent points on a cumulative distribution is always =< 1, and strictly <1 unless there are only two points at the bookends. Also, in section 4.3, on page 392, calculation shows that it is the variance $\textstyle {\tilde {\sigma ^{2}}}$ which has MPS estimate of 6.87, not the standard deviation $\textstyle {\tilde {\sigma }}$ . -- Editor
^ The literature refers to related statistics as Moran or Moran-Darling statistics. For example, Cheng & Stephens Template:Ref harvard analyze the form $\scriptstyle M(\theta )=-\sum _{j=1}^{n+1}\log {D_{i}(\theta )}$ where $D_{i}(\theta )$ is defined as above. Wong & Li Template:Ref harvard use the same form as well. However, Beirlant et. al. Template:Ref harvard uses the form $\scriptstyle M_{n}=-\sum _{j=0}^{n}\ln {((n+1)(X_{n,i+1}-X_{n,i}))}$ , with the additional factor of $(n+1)$ inside the logged summation. The extra factors will make a difference in terms of the expected mean and variance of the statistic. For consistency, this article will continue to use the Cheng & Amin/Wong & Li form. -- Editor
^ Wong & Li Template:Ref harvard leave out the Euler–Mascheroni constant from their description. -- Editor

References

¹ Anatolyev, Stanislav (2005). "An Alternative to Maximum Likelihood Based on Spacings" (PDF). Econometric Theory. 21 (2). Cambridge University Press: 472–476. doi:10.1017/S0266466605050255. ISSN 0266-4666. Retrieved 2009-01-21. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

¹ Beirlant, J. (1997). "Nonparametric entropy estimation: an overview" (PDF). International Journal of Mathematical and Statistical Sciences. 6 (1): 17–40. ISSN 1055-7490. Retrieved 2008-12-31. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help) Note: Linked paper is the updated 2001 version.

¹ ² ³ ⁴ ⁵ Cheng, R.C.H. (1983). "Estimating Parameters in Continuous Univariate Distributions with a Shifted Origin". Journal of the Royal Statistical Society Series B. 45 (3). Royal Statistical Society: 394–403. ISSN 0035-9246. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ¹⁰ Cheng, R.C.H (1989). "A goodness-of-fit test using Moran's statistic with estimated parameters". Biometrika. 76 (2). Oxford University Press: 386–392. doi:10.1093/biomet/76.2.385. ISSN 0006-3444. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

¹ Ekström, Magnus (1997). "Generalized Maximum Spacing Estimates" (PostScript). Research Report. 6. Umeå University. ISSN 0345-3928. Retrieved 2008-12-30.

¹ Hall, M.J. (2004). "The construction of confidence intervals for frequency analysis using resampling techniques" (PDF). Hydrology and Earth System Sciences. 8 (2). European Geosciences Union: 235–246. ISSN 1027-5606. Retrieved 2009-01-21. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

¹ ² Pyke, Ronald (1965). "Spacings". Journal of the Royal Statistical Society Series B. 27. Royal Statistical Society: 395–449. ISSN 0035-9246.

¹ Pyke, Ronald (1972). "Spacings Revisited" (PDF). Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability. 1. University of California Press: 417–427. ISSN 0097-0433. MR0405709Zbl 0234.62008. Retrieved 2008-12-30.

¹ ² ³ ⁴ ⁵ Ranneby, Bo (1984). "The Maximum Spacing Method. An Estimation Method Related to the Maximum Likelihood Method". Scandinavian Journal of Statistics. 11 (2): 93–112. ISSN 0303-6898.

¹ Ranneby, Bo (1997). "Maximum Spacing Estimates Based on Different Metrics" (PostScript). Research Report. 5. Umeå University. ISSN 0345-3928. Retrieved 2008-12-30. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

¹ Ranneby, Bo (2005). "The maximum spacing estimation for multivariate observations" (PDF). Journal of Statistical Planning and Inference. 129 (1–2). Elsevier: 427–446. doi:10.1016/j.jspi.2004.06.059. ISSN 0378-3758. Retrieved 12-31-2008. {{cite journal}}: Check date values in: |accessdate= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)

¹ ² Wong, T.S.T (2006). "A note on the estimation of extreme value distributions using maximum product of spacings" (PDF). IMS Lecture Notes–Monograph Series. 52. Institute of Mathematical Statistics: 272–283. doi:10.1214/074921706000001102. arXiv:math/0702830v1. Retrieved 2008-12-31. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[1] The actual definition is sourced to Template:Ref harvard, but without direct access to that paper, sourcing is given to Pyke's later paper which defines the spacings in passing. -- Editor

[2] There appear to be some minor typographical errors in the paper. For example, in section 4.2, equation (4.1), the rounding replacement for $D_{j}$ , should not have the log term. In section 1, equation (1.2), $D_{j}$ is defined to be the spacing itself, and $M(\theta )$ is the negative sum of the logs of $D_{j}$ . If $D_{j}$ is logged at this step, the result is always =<0, as the difference between two adjacent points on a cumulative distribution is always =< 1, and strictly <1 unless there are only two points at the bookends. Also, in section 4.3, on page 392, calculation shows that it is the variance $\textstyle {\tilde {\sigma ^{2}}}$ which has MPS estimate of 6.87, not the standard deviation $\textstyle {\tilde {\sigma }}$ . -- Editor

[3] The literature refers to related statistics as Moran or Moran-Darling statistics. For example, Cheng & Stephens Template:Ref harvard analyze the form $\scriptstyle M(\theta )=-\sum _{j=1}^{n+1}\log {D_{i}(\theta )}$ where $D_{i}(\theta )$ is defined as above. Wong & Li Template:Ref harvard use the same form as well. However, Beirlant et. al. Template:Ref harvard uses the form $\scriptstyle M_{n}=-\sum _{j=0}^{n}\ln {((n+1)(X_{n,i+1}-X_{n,i}))}$ , with the additional factor of $(n+1)$ inside the logged summation. The extra factors will make a difference in terms of the expected mean and variance of the statistic. For consistency, this article will continue to use the Cheng & Amin/Wong & Li form. -- Editor

[4] Wong & Li Template:Ref harvard leave out the Euler–Mascheroni constant from their description. -- Editor

[note 1]

[note 2]

[note 3]

[note 4]