PU learning

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In machine learning, PU learning is a collection of semisupervised techniques for training binary classifiers on positive and unlabeled examples only.[1]

In PU learning, two sets of samples are assumed to be available for training: the positive set P and a mixed set U, which is assumed to contain both positive and negative samples, but without these being labeled as such. This contrasts with other forms of semisupervised learning, where it is assumed that a labeled set containing examples of both classes is available. A variety of techniques exist to adapt supervised classifiers to the PU learning setting. PU learning successfully been applied to text classification [2][3][4] and bioinformatics tasks.[5]


  1. ^ Liu, Bing (2007). Web Data Mining. Springer. pp. 165−178. 
  2. ^ Bing Liu, Wee Sun Lee, Philip S. Yu and Xiao-Li Li (2002). "Partially supervised classification of text documents". ICML. pp. 8–12. 
  3. ^ Hwanjo Yu, Jiawei Han, Kevin Chen-Chuan Chang (2002). "PEBL: positive example based learning for web page classification using SVM". ACM SIGKDD. 
  4. ^ Xiao-Li Li and Bing Liu (2003). "Learning to classify text using positive and unlabeled data". IJCAI. 
  5. ^ Peng Yang, Xiao-Li Li, Jian-Ping Mei, Chee-Keong Kwoh and See-Kiong Ng (2012). "Positive-Unlabeled Learning for Disease Gene Identification". Bioinformatics, Vol 28(20).