User:Qymwill/Draft of the page for C-1SVM algorithm

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In machine learning, competing one class support vector machines (C-1SVMs) are supervised learning models designed for binary and multiclass classification. In the algorithm, multiple one class support vector machines are trained, and different one-class support vector machines model the distributions of samples from different classes and hence may compete with each other. C-1SVM is not a binary SVM, not a one class SVM either. It is a set of competing one-class SVMs. For binary classification, two one-class support vector machines are trained and maintained.

Given a set of training examples and their corresponding labels, a C-1SVM training algorithm builds several C-1SVM models that can cooperatively label new examples. Based on the idea of maximum-margin hyperplane used in the binary SVM, a C-1SVM model tries to find a hyperplane which can enclose as many target class examples as possible. New examples are predicted to belong to a category based on which hyperplane they fall in.

By combining one-class support vector machine and on-line learning, C-1SVM was firstly introduced by Minglun Gong and Li Cheng as a support vector method for foreground segmentation of live videos[1] in computer vision and image processing.

Algorithm[edit]

C-1SVM algorithm mainly involves two parts: training and classification. C-1SVM training is a kind of iterative process: given the one-class SVM models that are partially trained using received examples, when the model receives a new example with a label, the example is used to update the one-class SVM model of the corresponding class. The process stops until there are no changes in those one-class SVM models. C-1SVM classification is achieved by defining a score function for each class, and a new example belongs to the class with the maximum score.

One Class Support Vector Machine[edit]

The one-class SVM is proposed by Bernhard Sch¨olkopf, John C. Platt, John Shawe-Taylor and Alex J. Smola.[2] It is an unsupervised learning method that learns a decision function for novelty detection: classifying new data as similar or different to the training set. C-1SVM algorithm trian a one-class SVM for each class. Each one-class SVM model trained by C-1SVM algorithm can determine if new examples belong to its corresponding class or not. Finally, the trained one-class SVM models make classifications cooperatively.

On-line Learning[edit]

Binary SVM training using a large set of examples is a classical offline learning because the quadratic objective function that SVM aims to minimize would not change once the training process begins. Previous research[3] have shown that a similar solution can be obtained by following on-line learning scheme, in which training examples are repetitively inputted into an on-line learner following a repeated sequence. C-1SVM uses on-line learning because it can produce a partially trained C-1SVM model at each time point during the training process, and the partially trained model can be used by the new received example at each time point. These partially trained models are gradually refined towards the final solution. Because of the quicker feedback provided by partially trained C-1SVMs in the loop, on-line learning can help C-1SVM training achieve faster convergence than offline learning.

The online learner in C-1SVM training algorithm follows the one proposed by Li Cheng, S.V. N. Vishwanathan, Dale Schuurmans, Shaojun Wang, Terry Caelli.[4] Denote training examples in a time sequence as (x_t)_{t=1}^T and their corresponding labels as (y_t)_{t=1}^T, T is the number of training examples. Let f_t(\cdot) be a score function of example t and S_i(\cdot) be the i_{th} support vector of classes. The kernel function used in on-line learning is denoted as k(\cdot,\cdot). When a new labeled example x_t is received, the score function is defined as

f_t(x_t)=\sum_{i=1} \text{min}(\alpha_i,C) \chi(S_i(y_t) \neq x_t)k(S_i(y_t),x_t)

The weight of x_t is updated as

\alpha_t=\frac{\gamma -f_t(x_t)}{k(x_t,x_t)}

where \gamma is the margin that is often set to 1, C>0 is the cut-off value which is used to constrain the weight of x_t, and \chi(\cdot) is an indicator function: \chi(true)=1 and \chi(false)=0. If \alpha_t >0, x_t is added into the corresponding support vector set, else if x_t is already in the support vector set, then erase it from the set.

Three common kernel functions include:

When receiving an unlabeled example x_{new}, C-1SVM label it as the class with the maximum score that is computed using the score function defined above.

Example[edit]

Take the binary classification for example, and there are two samples in each class. Denote them as class\text{ } 1: x_1=(1,1),x_2=(2,2) and class\text{ } 2: x_3=(-1,-1),x_4=(-2,-2). Input the samples in the sequence: x_1,x_2,x_3,x_4,x_1,x_2,x_3,x_4,\dots. Both of the support vector sets are initially empty. The cut-off C=0.5, margin \gamma=1 and Gaussian function with standard deviation \sigma=1.0 is selected as the kernel function in the example. The algorithm can work as follows:

Step 1: receive x_1, since the support vector set of the first class is empty, the score function f_1(x_1)=0, then \alpha_1=1>0, and the support vector set S(1) becomes \{(x_1,\alpha_1)\}.
Step 2: receive x_2, compute f_2(x_2)=0.184, then \alpha_2=0.816>0, and the support vector set S(1) becomes \{(x_1,\alpha_1),(x_2,\alpha_2)\}.
Step 3: receive x_3, f_3(x_3)=0 and \alpha_3=1>0, the support vector set S(2) becomes \{(x_3,\alpha_3)\}.
Step 4: receive x_4, compute f_4(x_4)=0.184, then \alpha_4=0.816>0, and the support vector set S(2) becomes \{(x_3,\alpha_3),(x_4,\alpha_4)\}.

Following the above four steps, input samples in the same sequence and repeat the process until the two support vector sets do not change.

For classification, when an unlabeled sample x_{new} arrives, follow the defined score function, compute score[1] using trained support vectors in S(1) and score[2] using trained support vectors in S(2). If score[1]\geq score[2], then label x_{new} as class \text{ }1, else label it as class \text{ }2.

Pseudocode[edit]

The algorithm includes two parts: training and classification.

Training[edit]

Input: training examples x and labels y, total class number n

Output: support vectors of each class and corresponding weights

1  procedure C-1SVMtraining(x,y,n)
2  for i \leftarrow 1 \text{ to } n
3      allocate a set S(i) to store support vectors for the i_{th} class
4  while there are changes for support vector sets do
5     for t\leftarrow 1 \text{ to }T
6         receive an example x_t and its label y_t
7         compute score f_t(x_t)\leftarrow \sum_{i=1} \text{min}(\alpha_i,C) \chi(S_i(y_t) \neq x_t)k(S_i(y_t),x_t)
8         update weight \alpha_t\leftarrow \frac{\gamma -f_t(x_t)}{k(x_t,x_t)}
9         if there is a vector in S(y_t) equals to x_t
10           if \alpha_t \leq 0
11              delete (x_t,\alpha_t) from S(y_t)
12        else
13           if \alpha_t>0
14              push (x_t,\alpha_t) into S(y_t)
15 return S(1),S(2),\dots,S(n)

Classification[edit]

Input: trained support vector sets S, new unlabeled example x_{new}, total class number n

Output: label y_{new}

1 procedure C-1SVMclassification(S,x_{new},n)
2 for i \leftarrow \text{ to } n
3     score[i] \leftarrow \sum_{i=1}\text{min}(\alpha_i,C)\chi(S_j(i)\neq x_{new})k(S_j(i),x_{new})
4 y_{new}\leftarrow \{1,2,\dots,n|score[y_{new}]\text{ is the maximun}\}

Support vector sets of classes can be implemented by first in first out queue.

Binary SVMs vs C-1SVMs[edit]

Fig. 1. Comparison between a binary SVM and C-1SVMs. White dots represent the training examples of the first class and black dots represent the training examples of the second class, while red dot denotes an unlabeled example.

For binary classification, binary SVM algorithm only train one SVM model giving one hyperplane. However, C-1SVM algorithm totally train two one-class SVMs. By doing so, two hyperplanes enclosing the training examples more tightly are returned, which can help toward better detecting and handling of ambiguous cases.

A detailed comparison is given in Fig.1. Denote while dots as the first class and black dots as the second class. The line between white dots and black dots represents the hyperplane trained by binary SVM. The circles enclosing white dots or black dots are hyperplanes trained by C-1SVMs. In case (a), binary SVM classifies the red dot as the first class, whereas C-1SVMs classifies it as unknown because neither of one-class SVM hyperplanes include it as inlier. In case (b), because the distance between the red dot and the hyperline of binary SVM is too small, binary SVM does not have high confidence to label it, whereas C-1SVMs can correctly label it as the second class.

Complexity[edit]

Training: denote the total number of the optimal support vectors as m. Considering the worst case performance, we only add or delete one support vector in the loop of line 5 in the procedure C-1SVMtraining. Hence, m loops are required to convergence at most. Further, for each loop, T examples are received. Finally the running time could be O(mT). If all of the examples are support vectors in the worst case, O(T) space is required to store them.

Classification: to make classification, scores of all classes are computed. The running time and required space is both \Theta(n).

Applications[edit]

Most classification problems in machine learning can be solved by C-1SVMs. Similar to the SVM, C-1SVMs can also be applied in Computer Vision, Data Mining, Speech Recognition. For example, Gong and Cheng use C-1SVM algorithm to solve foreground segmentation of live videos[1], a problem in Computer Vision. They trained two one-class SVMs, one is for foreground pixels, and another one is for background pixels. When a new unlabeled pixel arrives, label it as foreground pixel or background pixel cooperatively using foreground and background one-class SVM.

References[edit]

  1. ^ a b Gong, Minglun; Cheng, Li (2011). "Foreground segmentation of live videos using locally competing 1SVMs". IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 
  2. ^ Sch¨olkopf, Bernhard; Platt, John C.; Shawe-Taylor, John; Smola, Alex J. (2001). "Estimating the support of a high-demensional distribution". Neural Computation, Vol. 13: 1443–1472. 
  3. ^ Bottou, Leon; Bousquet, Olivier (2008). "The Tradeoffs of Large Scale Learning". Advances in Neural Information Processing Systems. 
  4. ^ Cheng, Li; Vishwanathan, S.V. N.; Schuurmans, Dale; Wang, Shaojun; Caelli, Terry (2007). "Implicit Online Learning with Kernels". Advances in Neural Information Processing Systems. 

Category:Machine learning Category:Support vector machines Category:Classification algorithms