Multi-label classification

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple target labels must be assigned to each instance. Multi-label classification should not be confused with multiclass classification, which is the problem of categorizing instances into more than two classes. Formally, multi-label learning can be phrased as the problem of finding a model that maps inputs x to vectors y, rather than scalar outputs as in the ordinary classification problem.

There are two main methods for tackling the multi-label classification problem:[1] problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems. Algorithm adaptation methods adapt the algorithms to directly perform multi-label classification.

Contents

Problem transformation [edit]

Several problem transformation methods exist for multi-label classification; the baseline approach[2] amounts to training one classifier per label, similar to the one-vs.-all (OvA, also one-vs.-rest, OvR) method for multiclass classification. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective binary classifiers predict a positive result. (This method has also been called the "binary relevance" method.[3])

Various other transformations exist: the label combinations (LC) transformation, creates one binary classifier for every possible label combination. Other transformation methods include RAkEL[4] and classifier chains.[3] Various problem transformation methods have been developed such as Ml-kNN,[5] a variant of the k-nearest neighbors lazy classifiers. A more detailed description of the most well known methods for multi-label classification and an extensive empirical evaluation can be found here.[6]

Evaluation [edit]

Evaluation metrics for multi-label classification are inherently different from those used in multi-class (or binary) classification, due to the inherent differences of the classification problem. The following metrics are typically used:

  • Hamming loss: the percentage of the wrong labels to the total number of labels. This is a loss function, so the optimal value is zero.
  • Label-based accuracy
  • Exact match: is the most strict metric, indicating the percentage of samples that have all their labels classified correctly.

Implementations and datasets [edit]

Java implementations of multi-label algorithms are available in the Mulan and Meka software packages, both based on Weka.

A Python implementation is available in the scikit-learn package; this wraps an arbitrary binary classifier in a one-vs.-rest (OvR) construction.

A list of commonly used multi-label data-sets is available at the Mulan website.

References [edit]

  1. ^ Grigorios Tsoumakas, Ioannis Katakis. Multi-Label Classification: An Overview. International Journal of Data Warehousing & Mining, 3(3), 1-13, July–September 2007.
  2. ^ Balasubramanian, Krishnakumar; Lebanon, Guy (2012). "The landmark selection method for multiple output prediction". ICML. 
  3. ^ a b Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank. Classifier Chains for Multi-label Classification. Machine Learning Journal. Springer. Vol. 85(3), (2011).
  4. ^ Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, Ioannis Vlahavas Multi-label Classification of Music into emotions ISMIR 2008
  5. ^ Zhang, M.L. and Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning
  6. ^ Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, Sašo Džeroski. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition. Vol. 45(9), (2012).