# Bayes classifier

In statistical classification the Bayes classifier minimises the probability of misclassification.[1]

## Definition

Suppose a pair $(X,Y)$ takes values in $\mathbb{R}^d \times \{1,2,\dots,K\}$, where $Y$ is the class label of $X$. This means that the conditional distribution of X, given that the label Y takes the value r is given by

$X\mid Y=r \sim P_r$ for $r=1,2,\dots,K$

where "$\sim$" means "is distributed as", and where $P_r$ denotes a probability distribution.

A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function $C: \mathbb{R}^d \to \{1,2,\dots,K\}$, with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as

$\mathcal{R}(C) = \operatorname{P}\{C(X) \neq Y\}.$

The Bayes classifier is

$C^\text{Bayes}(x) = \underset{r \in \{1,2,\dots, K\}}{\operatorname{argmax}} \operatorname{P}(Y=r \mid X=x).$

In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively -- in this case, $\operatorname{P}(Y=r \mid X=x)$. The Bayes classifier is one of the useful benchmarks in statistical classification.

The excess risk of a general classifier $C$ (possibly depending on some training data) is defined as $\mathcal{R}(C) - \mathcal{R}(C^\text{Bayes}).$ Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.[citation needed]