# Nearest neighbour classifiers

Nearest neighbour classifiers are a class of non-parametric methods used in statistical classification (or pattern recognition). The method classifies objects based on closest training examples in the feature space.

## Statistical setting

Suppose we have pairs $(X,Y), (X_1,Y_1), \dots, (X_n, Y_n)$ taking values in $\mathbb{R}^d \times \{1,2\}$, where $Y$ is the class label of $X$, so that $X|Y=r \sim P_r$ for $r=1,2$ (and probability distributions $P_r$). Given some norm $\|\cdot\|$ on $\mathbb{R}^d$ and a point $x \in \mathbb{R}^d$, let $(X_{(1)},Y_{(1)}), \dots, (X_{(n)}, Y_{(n)})$ be a reordering of the training data such that $\|X_{(1)}-x\| \leq \dots \leq \|X_{(n)}-x\|$.

## The $1$-nearest neighbour classifier

The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier that assigns a point $x$ to the class of its closest neighbour in the feature space, that is $C_n^{1nn}(x) = Y_{(1)}$.

As the size of training data set approaches infinity, the one nearest neighbour classifier guarantees an error rate of no worse than twice the Bayes error rate (the minimum achievable error rate given the distribution of the data).

## The $k$-nearest neighbour classifier

The $k$-nearest neighbour classifier assigns a point $x$ to a particular class based on a majority vote among the classes of the $k$ nearest training points to $x$.

### Properties

There are many results on the error rate of the $k$ nearest neighbour classifiers.[1] The $k$-nearest neighbour classifier is strongly (that is for any joint distribution on $(X, Y)$) consistent provided $k:=k_n$ diverges and $k_n/n$ converges to zero as $n \to \infty$.

Let $C_n^{knn}$ denote the $k$ nearest neighbour classifier based on a training set of size $n$. Under certain regularity conditions, the excess risk yields the following asymptotic expansion[2]

$\mathcal{R}_\mathcal{R}(C^{knn}_{n}) - \mathcal{R}_{\mathcal{R}}(C^{Bayes}) = \left\{B_1\frac 1 k + B_2 \left(\frac k n\right)^{4/d}\right\} \{1+o(1)\},$

for some constants $B_1$ and $B_2$.

The choice $k^* = \lfloor B n^{\frac 4 {d+4}} \rfloor$ offers a trade off between the two terms in the above display, for which the $k^*$-nearest neighbour error converges the Bayes error at the optimal (minimax) rate $\mathcal{O}(n^{-\frac 4 {d+4}})$.

## The weighted nearest neighbour classifier

The $k$-nearest neighbour classifier can be viewed as assigning the $k$ nearest neighbours a weight $1/k$ and all others $0$ weight. This can be generalised to weighted nearest neighbour classifiers. That is, where the $i$th nearest neighbour is assigned a weight $w_{ni}$, with $\sum_{i=1}^n w_{ni} = 1$. An analogous result on the strong consistency of weighted nearest neighbour classifiers also holds.[3]

Let $C^{wnn}_n$ denote the weighted nearest classifier with weights $\{w_{ni}\}_{i=1}^n$. Subject to regularity conditions on to class distributions the excess risk has the following asymptotic expansion[2]

$\mathcal{R}_\mathcal{R}(C^{wnn}_{n}) - \mathcal{R}_{\mathcal{R}}(C^{Bayes}) = \left(B_1 s_n^2 + B_2 t_n^2\right) \{1+o(1)\},$

for constants $B_1$ and $B_2$ where $s_n^2 = \sum_{i=1}^n w_{ni}^2$ and $t_n = n^{-2/d}\sum_{i=1}^n w_{ni}\{i^{1+2/d} - (i-1)^{1+2/d}\}$.

The optimal weighting scheme $\{w_{ni}^*\}_{i=1}^n$, that balances the two terms in the display above, is given as follows: set $k^* = \lfloor B n^{\frac 4 {d+4}} \rfloor$,

$w_{ni}^* = \frac 1 {k^*} \left[1 + \frac d 2 - \frac d {2{k^*}^{2/d}} \{ i ^{1+2/d} - (i-1)^{1+2/d}\}\right]$ for $i=1,2,\dots,k^*$ and
$w^*_{ni} = 0$ for $i = k^*+1,\dots,n$.

With optimal weights the dominant term in the asymptotic expansion of the excess risk is $\mathcal{O}(n^{-\frac 4 {d+4}})$. Similar results are true when using a bagged nearest neighbour classifier.

## References

1. ^ Devroye, L., Gyorfi, L. & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7.
2. ^ a b Samworth R. J. (2012). "Optimal weighted nearest neighbour classifiers". Annals of Statistics 40 (5): 2733–2763. doi:10.1214/12-AOS1049.
3. ^ Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics 5 (4): 595–620. doi:10.1214/aos/1176343886.