Bayes classifier

In statistical classification, the Bayes classifier minimizes the probability of misclassification.^[1]

Definition

Suppose a pair $(X,Y)$ takes values in $\mathbb {R} ^{d}\times \{1,2,\dots ,K\}$ , where $Y$ is the class label of $X$ . This means that the conditional distribution of X, given that the label Y takes the value r is given by

X\mid Y=r\sim P_{r}

for

r=1,2,\dots ,K

where " $\sim$ " means "is distributed as", and where $P_{r}$ denotes a probability distribution.

A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function $C:\mathbb {R} ^{d}\to \{1,2,\dots ,K\}$ , with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as

{\mathcal {R}}(C)=\operatorname {P} \{C(X)\neq Y\}.

The Bayes classifier is

C^{\text{Bayes}}(x)={\underset {r\in \{1,2,\dots ,K\}}{\operatorname {argmax} }}\operatorname {P} (Y=r\mid X=x).

In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, $\operatorname {P} (Y=r\mid X=x)$ . The Bayes classifier is a useful benchmark in statistical classification.

The excess risk of a general classifier $C$ (possibly depending on some training data) is defined as ${\mathcal {R}}(C)-{\mathcal {R}}(C^{\text{Bayes}}).$ Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.^[2]

References

^ Devroye, L.; Gyorfi, L.; Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7. {{cite book}}: Unknown parameter |last-author-amp= ignored (|name-list-style= suggested) (help)
^ https://dl.acm.org/doi/abs/10.1109/18.243433

[PTPR-1] Devroye, L.; Gyorfi, L.; Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7. {{cite book}}: Unknown parameter |last-author-amp= ignored (|name-list-style= suggested) (help)

[2] ttps://dl.acm.org/doi/abs/10.1109/18.243433

[1]

[2]

Definition

See also

References