Calibration (statistics)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. Thus "calibration" can mean

  • A reverse process to regression, where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable.[1]
  • Procedures in statistical classification to determine class membership probabilities which assess the uncertainty of a given new observation belonging to each of the already established classes.

In addition, "calibration" is used in statistics with the usual general meaning of calibration. For example, model calibration can be also used to refer to Bayesian inference about the value of a model's parameters, given some data set, or more generally to any type of fitting of a statistical model.

In regression[edit]

The calibration problem in regression is the use of known data on the observed relationship between a dependent variable and an independent variable to make estimates of other values of the independent variable from new observations of the dependent variable.[2][3][4] This can be known as "inverse regression":[5] see also sliced inverse regression.

One example is that of dating objects, using observable evidence such as tree rings for dendrochronology or carbon-14 for radiometric dating. The observation is caused by the age of the object being dated, rather than the reverse, and the aim is to use the method for estimating dates based on new observations. The problem is whether the model used for relating known ages with observations should aim to minimise the error in the observation, or minimise the error in the date. The two approaches will produce different results, and the difference will increase if the model is then used for extrapolation at some distance from the known results.

In classification[edit]

Calibration in classification means turning transform classifier scores into class membership probabilities. An overview of calibration methods for two-class and multi-class classification tasks is given by Gebel (2009).[6]

The following univariate calibration methods exist for transforming classifier scores into class membership probabilities in the two-class case:

The following multivariate calibration methods exist for transforming classifier scores into class membership probabilities in the case with classes count greater than two:

  • Reduction to binary tasks and subsequent pairwise coupling, see Hastie and Tibshirani (1998)[12]
  • Dirichlet calibration, see Gebel (2009)

See also[edit]


  1. ^ Upton, G, Cook, I. (2006) Oxford Dictionary of Statistics, OUP. ISBN 978-0-19-954145-4
  2. ^ Brown, P.J. (1994) Measurement, Regression and Calibration, OUP. ISBN 0-19-852245-2
  3. ^ Ng, K. H., Pooi, A. H. (2008) "Calibration Intervals in Linear Regression Models", Communications in Statistics - Theory and Methods, 37 (11), 1688–1696. [1]
  4. ^ Hardin, J. W., Schmiediche, H., Carroll, R. J. (2003) "The regression-calibration method for fitting generalized linear models with additive measurement error", Stata Journal, 3 (4), 361–372. link, pdf
  5. ^ Draper, N.L., Smith, H. (1998) Applied Regression analysis, 3rd Edition, Wiley. ISBN 0-471-17082-8
  6. ^ M. Gebel, "[2]," Multivariate calibration of classifier scores into the probability space, Dissertation, Universität Dortmund, 2009.
  7. ^ U. M. Garczarek "[3]," Classification Rules in Standardized Partition Spaces, Dissertation, Universität Dortmund, 2002
  8. ^ P. N. Bennett, Using asymmetric distributions to improve text classifier probability estimates: A comparison of new and standard parametric methods, Technical Report CMU-CS-02-126 , Carnegie Mellon, School of Computer Science, 2002.
  9. ^ B. Zadrozny and C. Elkan, Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining , 694–699, Edmonton, ACM Press, 2002.
  10. ^ D. D. Lewis and W. A. Gale, A Sequential Algorithm for Training Text classifiers. In: W. B. Croft and C. J. van Rijsbergen (eds.), Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '94), 3–12. New York, Springer-Verlag, 1994.
  11. ^ J. C. Platt, Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: A. J. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans (eds.), Advances in Large Margin Classiers, 61–74. Cambridge, MIT Press, 1999.
  12. ^ T. Hastie and R. Tibshirani, "[4]," Classification by pairwise coupling. In: M. I. Jordan, M. J. Kearns and S. A. Solla (eds.), Advances in Neural Information Processing Systems, volume 10, Cambridge, MIT Press, 1998.