One-class classification

From Wikipedia, the free encyclopedia
  (Redirected from PU learning)
Jump to navigation Jump to search

In machine learning, one-class classification, also known as unary classification or class-modelling, tries to identify objects of a specific class amongst all objects, by primarily learning from a training set containing only the objects of that class[1], although there exist variants of one-class classifiers where counter-examples are used to further refine the classification boundary. This is different from and more difficult than the traditional classification problem, which tries to distinguish between two or more classes with the training set containing objects from all the classes. An example is the classification of the operational status of a nuclear plant as 'normal':[2] In this scenario, there are few, if any, examples of catastrophic system states; only the statistics of normal operation are known. The term one-class classification was coined by Moya & Hush (1996)[3] and many applications can be found in scientific literature, for example outlier detection, anomaly detection, novelty detection. A feature of one-class classification is that it uses only sample points from the assigned class, so that a representative sampling is not strictly required for non-target classes.[4]

While many of the above approaches focus on the case of removing a small number of outliers or anomalies, one can also learn the other extreme, where the single class covers a small coherent subset of the data, using an information bottleneck approach.[5]

PU learning[edit]

A similar problem is PU learning, in which a binary classifier is learned in a semi-supervised way from only positive and unlabeled sample points.[6]

In PU learning, two sets of examples are assumed to be available for training: the positive set and a mixed set , which is assumed to contain both positive and negative samples, but without these being labeled as such. This contrasts with other forms of semisupervised learning, where it is assumed that a labeled set containing examples of both classes is available in addition to unlabeled samples. A variety of techniques exist to adapt supervised classifiers to the PU learning setting, including variants of the EM algorithm. PU learning has been successfully applied to text,[7][8][9] time series,[10] and bioinformatics tasks.[11]

See also[edit]


  1. ^ Oliveri, Paolo. "Class-modelling in food analytical chemistry: Development, sampling, optimisation and validation issues – A tutorial". Analytica Chimica Acta. 982: 9–19. doi:10.1016/j.aca.2017.05.013.
  2. ^ Tax, D. (2001) One-class classification: Concept-learning in the absence of counter-examples. Doctoral Dissertation, University of Delft, The Netherlands.
  3. ^ Moya, M. and Hush, D. (1996). "Network constraints and multi- objective optimization for one-class classification". Neural Networks, 9(3):463–474. doi:10.1016/0893-6080(95)00120-4
  4. ^ Rodionova, Oxana Ye; Oliveri, Paolo; Pomerantsev, Alexey L. (2016-12-15). "Rigorous and compliant approaches to one-class classification". Chemometrics and Intelligent Laboratory Systems. 159: 89–96. doi:10.1016/j.chemolab.2016.10.002.
  5. ^ Crammer, Koby (2004). "A needle in a haystack: local one-class optimization". ICML Proceedings of the twenty-first international conference on Machine learning: 26.
  6. ^ Liu, Bing (2007). Web Data Mining. Springer. pp. 165−178.
  7. ^ Bing Liu; Wee Sun Lee; Philip S. Yu & Xiao-Li Li (2002). Partially supervised classification of text documents. ICML. pp. 8–12.
  8. ^ Hwanjo Yu; Jiawei Han; Kevin Chen-Chuan Chang (2002). PEBL: positive example based learning for web page classification using SVM. ACM SIGKDD.
  9. ^ Xiao-Li Li & Bing Liu (2003). Learning to classify text using positive and unlabeled data. IJCAI.
  10. ^ Minh Nhut Nguyen; Xiao-Li Li & See-Kiong Ng (2011). Positive Unlabeled Learning for Time Series Classification. IJCAI.
  11. ^ Peng Yang; Xiao-Li Li; Jian-Ping Mei; Chee-Keong Kwoh & See-Kiong Ng (2012). Positive-Unlabeled Learning for Disease Gene Identification. Bioinformatics, Vol 28(20).