Jump to content

Central tendency

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 81.98.35.149 (talk) at 22:43, 31 March 2013 (Undid revision 548036122 by 76.101.238.233 (talk)vandalism). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics, the term central tendency relates to the way in which quantitative data tend to cluster around some value.[1] A measure of central tendency is any of a number of ways of specifying this "central value". In practical statistical analysis, the terms are often used before one has chosen even a preliminary form of analysis: thus an initial objective might be to "choose an appropriate measure of central tendency".

In the simplest cases, the measure of central tendency is an average of a set of measurements, the word average being variously construed as mean, median, or other measure of location, depending on the context. However, the term is applied to multidimensional data as well as to univariate data and in situations where a transformation of the data values for some or all dimensions would usually be considered necessary: in the latter cases, the notion of a "central location" is retained in converting an "average" computed for the transformed data back to the original units. In addition, there are several different kinds of calculations for central tendency, where the kind of calculation depends on the type of data (level of measurement).

Both "central tendency" and "measure of central tendency" apply to either statistical populations or to samples from a population.

Measures of central tendency

The following may be applied to one-dimensional data, after transformation, although some of these involve their own implicit transformation of the data.

  • Arithmetic mean (or simply, mean) – the sum of all measurements divided by the number of observations in the data set
  • Median – the middle value that separates the higher half from the lower half of the data set
  • Mode – the most frequent value in the data set
  • Geometric mean – the nth root of the product of the data values
  • Harmonic mean – the reciprocal of the arithmetic mean of the reciprocals of the data values
  • Weighted mean – an arithmetic mean that incorporates weighting to certain data elements
  • Truncated mean – the arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been discarded.
  • Midrange – the arithmetic mean of the maximum and minimum values of a data set.
  • Midhinge – the arithmetic mean of the two quartiles.
  • Trimean – the weighted arithmetic mean of the median and two quartiles.
  • Winsorized mean – an arithmetic mean in which extreme values are replaced by values closer to the median.

Any of the above may be applied to each dimension of multi-dimensional data and, in addition, there is the

  • Geometric median - which minimizes the sum of distances to the data points. This is the same as the median when applied to one-dimensional data, but it is not the same as taking the median of each dimension independently.

See also

References

  1. ^ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9