Median absolute deviation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.

For a univariate data set X1X2, ..., Xn, the MAD is defined as the median of the absolute deviations from the data's median:


\operatorname{MAD} = \operatorname{median}_{i}\left(\ \left| X_{i} - \operatorname{median}_{j} (X_{j}) \right|\ \right), \,

that is, starting with the residuals (deviations) from the data's median, the MAD is the median of their absolute values.

Example[edit]

Consider the data (1, 1, 2, 2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1, 1, 2, 4, 7)). So the median absolute deviation for this data is 1.

Uses[edit]

no uses

Relation to standard deviation[edit]

In order to use the MAD as a consistent estimator for the estimation of the standard deviation σ, one takes

\hat{\sigma}=K\cdot \operatorname{MAD}, \,

where K is a constant scale factor, which depends on the distribution.

For normally distributed data K is taken to be 1/Φ−1(3/4) \approx 1.4826, where Φ−1 is the inverse of the cumulative distribution function for the standard normal distribution, i.e., the quantile function. This is because the MAD is given by:

\frac 12 =P(|X-\mu|\le \operatorname{MAD})=P\left(\left|\frac{X-\mu}{\sigma}\right|\le \frac {\operatorname{MAD}}\sigma\right)=P\left(|Z|\le \frac {\operatorname{MAD}}\sigma\right).

Therefore we must have that Φ(MAD/σ) − Φ(−MAD/σ) = 1/2. Since Φ(−MAD/σ) = 1 − Φ(MAD/σ) we have that MAD/σ = Φ−1(3/4) from which we obtain the scale factor K = 1/Φ−1(3/4).

Hence

\sigma \approx 1.4826\ \operatorname{MAD}. \,

In other words, the expectation of 1.4826 times the MAD for large samples of normally distributed Xi is approximately equal to the population standard deviation.

The factor 1.4826\ \approx 1/\left(\Phi^{-1}(3/4)\right) results from the reciprocal of the normal inverse cumulative distribution function, \Phi^{-1}(P), evaluated at probability P=3/4.[1]

The population MAD[edit]

The population MAD is defined analogously to the sample MAD, but is based on the complete distribution rather than on a sample. For a symmetric distribution with zero mean, the population MAD is the 75th percentile of the distribution.

Unlike the variance, which may be infinite or undefined, the population MAD is always a finite number. For example, the standard Cauchy distribution has undefined variance, but its MAD is 1.

The earliest known mention of the concept of the MAD occurred in 1816, in a paper by Carl Friedrich Gauss on the determination of the accuracy of numerical observations.[2][3]

See also[edit]

Notes[edit]

  1. ^ [1]
  2. ^ Gauss, Carl Friedrich (1816). "Bestimmung der Genauigkeit der Beobachtungen". Zeitschrift für Astronomie und verwandte Wissenschaften 1: 187–197. 
  3. ^ Walker, Helen (1931). Studies in the History of the Statistical Method. Baltimore, MD: Williams & Wilkins Co. pp. 24–25. 

References[edit]

  • Hoaglin, David C.; Frederick Mosteller and John W. Tukey (1983). Understanding Robust and Exploratory Data Analysis. John Wiley & Sons. pp. 404–414. ISBN 0-471-09777-2. 
  • Russell, Roberta S.; Bernard W. Taylor III. (2006). Operations Management. John Wiley & Sons. pp. 497–498. ISBN 0-471-69209-3. 
  • Venables, W.N.; B.D. Ripley (1999). Modern Applied Statistics with S-PLUS. Springer. p. 128. ISBN 0-387-98825-4.