Nearest neighbour classifiers

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Nearest neighbour classifiers are a class of non-parametric methods used in statistical classification (or pattern recognition). The method classifies objects based on closest training examples in the feature space.

Statistical setting[edit]

Suppose we have pairs (X,Y), (X_1,Y_1), \dots, (X_n, Y_n) taking values in \mathbb{R}^d \times \{1,2\}, where Y is the class label of X, so that X|Y=r \sim P_r for r=1,2 (and probability distributions P_r). Given some norm \|\cdot\| on \mathbb{R}^d and a point x \in \mathbb{R}^d, let (X_{(1)},Y_{(1)}), \dots, (X_{(n)}, Y_{(n)}) be a reordering of the training data such that  \|X_{(1)}-x\| \leq \dots \leq \|X_{(n)}-x\| .

The 1-nearest neighbour classifier[edit]

The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier that assigns a point x to the class of its closest neighbour in the feature space, that is C_n^{1nn}(x) =  Y_{(1)}.

As the size of training data set approaches infinity, the one nearest neighbour classifier guarantees an error rate of no worse than twice the Bayes error rate (the minimum achievable error rate given the distribution of the data).

The k-nearest neighbour classifier[edit]

The k-nearest neighbour classifier assigns a point x to a particular class based on a majority vote among the classes of the k nearest training points to x.

Properties[edit]

There are many results on the error rate of the k nearest neighbour classifiers.[1] The k-nearest neighbour classifier is strongly (that is for any joint distribution on  (X, Y)) consistent provided k:=k_n diverges and k_n/n converges to zero as n \to \infty.

Let  C_n^{knn} denote the  k nearest neighbour classifier based on a training set of size  n. Under certain regularity conditions, the excess risk yields the following asymptotic expansion[2]

\mathcal{R}_\mathcal{R}(C^{knn}_{n}) - \mathcal{R}_{\mathcal{R}}(C^{Bayes}) = \left\{B_1\frac 1 k + B_2 \left(\frac k n\right)^{4/d}\right\} \{1+o(1)\},

for some constants  B_1 and  B_2 .

The choice k^* = \lfloor B n^{\frac 4 {d+4}} \rfloor offers a trade off between the two terms in the above display, for which the k^*-nearest neighbour error converges the Bayes error at the optimal (minimax) rate \mathcal{O}(n^{-\frac 4 {d+4}}).

The weighted nearest neighbour classifier[edit]

The k-nearest neighbour classifier can be viewed as assigning the k nearest neighbours a weight 1/k and all others 0 weight. This can be generalised to weighted nearest neighbour classifiers. That is, where the ith nearest neighbour is assigned a weight w_{ni}, with \sum_{i=1}^n w_{ni} = 1. An analogous result on the strong consistency of weighted nearest neighbour classifiers also holds.[3]

Let C^{wnn}_n denote the weighted nearest classifier with weights \{w_{ni}\}_{i=1}^n. Subject to regularity conditions on to class distributions the excess risk has the following asymptotic expansion[2]

\mathcal{R}_\mathcal{R}(C^{wnn}_{n}) - \mathcal{R}_{\mathcal{R}}(C^{Bayes}) = \left(B_1 s_n^2 + B_2 t_n^2\right) \{1+o(1)\},

for constants B_1 and B_2 where s_n^2 = \sum_{i=1}^n w_{ni}^2 and t_n = n^{-2/d}\sum_{i=1}^n w_{ni}\{i^{1+2/d} - (i-1)^{1+2/d}\}.

The optimal weighting scheme \{w_{ni}^*\}_{i=1}^n, that balances the two terms in the display above, is given as follows: set k^* = \lfloor B n^{\frac 4 {d+4}} \rfloor,

w_{ni}^*  = \frac 1 {k^*} \left[1 + \frac d 2 - \frac d {2{k^*}^{2/d}} \{ i ^{1+2/d} - (i-1)^{1+2/d}\}\right] for i=1,2,\dots,k^* and
 w^*_{ni} = 0 for  i = k^*+1,\dots,n.

With optimal weights the dominant term in the asymptotic expansion of the excess risk is \mathcal{O}(n^{-\frac 4 {d+4}}). Similar results are true when using a bagged nearest neighbour classifier.

References[edit]

  1. ^ Devroye, L., Gyorfi, L. & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7. 
  2. ^ a b Samworth R. J. (2012). "Optimal weighted nearest neighbour classifiers". Annals of Statistics 40 (5): 2733–2763. doi:10.1214/12-AOS1049. 
  3. ^ Stone C. J. (1977). "Consistent nonparametric regression". Annals of Statistics 5 (4): 595–620. doi:10.1214/aos/1176343886. 

Further reading[edit]