# Truncated mean

A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both. This number of points to be discarded is usually given as a percentage of the total number of points, but may also be given as a fixed number of points.

For most statistical applications, 5 to 25 percent of the ends are discarded; the 25% trimmed mean (when the lowest 25% and the highest 25% are discarded) is known as the interquartile mean. For example, given a set of 8 points, trimming by 12.5% would discard the minimum and maximum value in the sample: the smallest and largest values, and would compute the mean of the remaining 6 points.

The median can be regarded as a fully truncated mean and is most robust. As with other trimmed estimators, the main advantage of the trimmed mean is robustness and higher efficiency for mixed distributions and heavy-tailed distribution (like the Cauchy distribution), at the cost of lower efficiency for some other less heavily-tailed distributions (such as the normal distribution). For intermediate distributions the differences between the efficiency of the mean and the median are not very big, e.g. for the student-t distribution with 2 degrees of freedom the variances for mean and median are nearly equal.

## Terminology

In some regions of Central Europe it is also known as a Windsor mean, but this name should not be confused with the Winsorized mean: in the latter, the observations that the trimmed mean would discard are instead replaced by the largest/smallest of the remaining values.

Discarding only the maximum and minimum is known as the modified mean, particularly in management statistics.[1]

## Interpolation

When the percentage of points to discard does not yield a whole number, the trimmed mean may be defined by interpolation, generally linear interpolation, between the nearest whole numbers. For example, if you need to calculate the 15% trimmed mean of a sample containing 10 entries, strictly this would mean discarding 1 point from each end (equivalent to the 10% trimmed mean). If interpolating, one would instead compute the 10% trimmed mean (discarding 1 point from each end) and the 20% trimmed mean (discarding 2 points from each end), and then interpolating, in this case averaging these two values. Similarly, if interpolating the 12% trimmed mean, one would take the weighted average: weight the 10% trimmed mean by 0.8 and the 20% trimmed mean by 0.2.

The truncated mean is a useful estimator because it is less sensitive to outliers than the mean but will still give a reasonable estimate of central tendency or mean for many statistical models. In this regard it is referred to as a robust estimator.

One situation in which it can be advantageous to use a truncated mean is when estimating the location parameter of a Cauchy distribution, a bell shaped probability distribution with (much) fatter tails than a normal distribution. It can be shown that the truncated mean of the middle 24% sample order statistics (i.e., truncate the sample by 38%) produces an estimate for the population location parameter that is more efficient than using either the sample median or the full sample mean.[2][3] However, due to the fat tails of the Cauchy distribution, the efficiency of the estimator decreases as more of the sample gets used in the estimate.[2][3] Note that for the Cauchy distribution, neither the truncated mean, full sample mean or sample median represents a maximum likelihood estimator, nor are any as asymptotically efficient as the maximum likelihood estimator; however, the maximum likelihood estimate is more difficult to compute, leaving the truncated mean as a useful alternative.[3][4]

## Drawbacks

The truncated mean uses more information from the distribution or sample than the median, but unless the underlying distribution is symmetric, the truncated mean of a sample is unlikely to produce an unbiased estimator for either the mean or the median.

## Examples

The scoring method used in many sports that are evaluated by a panel of judges is a truncated mean: discard the lowest and the highest scores; calculate the mean value of the remaining scores.[5]

The Libor benchmark interest rate is calculated as a trimmed mean: given 18 response, the top 4 and bottom 4 are discarded, and the remaining 10 are averaged (yielding trim factor of 4/18 ≈ 22%).[6]

Consider the data set consisting of:

$\{92, 19, \mathbf{101}, 58, \mathbf{1053}, 91, 26, 78, 10, 13, \mathbf{-40}, \mathbf{101}, 86, 85, 15, 89, 89, 28, \mathbf{-5}, 41\} \qquad (N = 20, mean = 101.5)$

The 5th percentile (-6.75) lies between −40 and −5, while the 95th percentile (148.6) lies between 101 and 1053 (values shown in bold). Then, a 5% trimmed mean would result in the following:

$\{92, 19, 101, 58, 91, 26, 78, 10, 13, 101, 86, 85, 15, 89, 89, 28, -5, 41\} \qquad (N = 18, mean = 56.5)$

This example can be compared with the one using the Winsorising procedure.

Python can calculate the trimmed mean using NumPy and SciPy libraries :

import scipy.stats
import numpy as np
a = np.array([92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, -40, 101, 86, 85, 15, 89, 89, 28, -5, 41])
lower_limit = np.percentile(a, 5) # computes the qth percentile (5th in this case) of the data in the array a
upper_limit = np.percentile(a, 95)
scipy.stats.tmean(a, limits=(lower_limit,upper_limit ), inclusive=(True, True)) # inclusive determines whether values exactly equal to the lower or upper limits are included