Bayes classifier

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistical classification the Bayes classifier minimises the probability of misclassification.[1]


Suppose a pair (X,Y) takes values in \mathbb{R}^d \times \{1,2,\dots,K\}, where Y is the class label of X. This means that the conditional distribution of X, given that the label Y takes the value r is given by

X\mid Y=r \sim P_r for r=1,2,\dots,K

where "\sim" means "is distributed as", and where P_r denotes a probability distribution.

A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function C: \mathbb{R}^d \to \{1,2,\dots,K\}, with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as

\mathcal{R}(C)  = \operatorname{P}\{C(X) \neq Y\}.

The Bayes classifier is

C^\text{Bayes}(x) = \underset{r \in \{1,2,\dots, K\}}{\operatorname{argmax}} \operatorname{P}(Y=r \mid X=x).

In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, \operatorname{P}(Y=r \mid X=x). The Bayes classifier is a useful benchmark in statistical classification.

The excess risk of a general classifier C (possibly depending on some training data) is defined as \mathcal{R}(C) - \mathcal{R}(C^\text{Bayes}). Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.[citation needed]

See also[edit]


  1. ^ Devroye, L., Gyorfi, L. & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7.