- Not to be confused with multiclass classification.
In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple target labels must be assigned to each instance. Multi-label classification should not be confused with multiclass classification, which is the problem of categorizing instances into more than two classes. Formally, multi-label learning can be phrased as the problem of finding a model that maps inputs x to vectors y, rather than scalar outputs as in the ordinary classification problem.
There are two main methods for tackling the multi-label classification problem: problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems, which can then be handled using single-class classifiers. Algorithm adaptation methods adapt the algorithms to directly perform multi-label classification. In other words, rather than trying to convert the problem to a simpler problem, they try to address the problem in its full form.
Several problem transformation methods exist for multi-label classification; the baseline approach, called the binary relevance method, amounts to independently training one binary classifier for each label. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result. This method of dividing the task into multiple binary tasks has something in common with the one-vs.-all (OvA, or one-vs.-rest, OvR) method for multiclass classification. Note though that it is not the same method: in binary relevance we train one classifier for each label, not one classifier for each possible value for the label.
Various other transformations exist: the label powerset (LP) transformation, creates one binary classifier for every possible label combination. Other transformation methods include RAkEL and classifier chains.
Various algorithm adaptation methods have been developed such as kernel methods for vector output and Ml-kNN, a variant of the k-nearest neighbors lazy classifiers. A more detailed description of the most well known methods for multi-label classification and an extensive empirical evaluation can be found here.
Evaluation metrics for multi-label classification are inherently different from those used in multi-class (or binary) classification, due to the inherent differences of the classification problem. The following metrics are typically used:
- Hamming loss: the percentage of the wrong labels to the total number of labels. This is a loss function, so the optimal value is zero.
- Label-based accuracy
- Exact match: is the most strict metric, indicating the percentage of samples that have all their labels classified correctly.
Implementations and datasets
Clus is a decision tree and rule induction system that implements the predictive clustering framework. This framework unifies unsupervised clustering and predictive modelling and allows for a natural extension to more complex prediction settings such as multi-task learning and multi-label classification:
GUIDE is a multi-purpose machine learning algorithm for constructing classification and regression trees
A list of commonly used multi-label data-sets is available at the Mulan website.
- Grigorios Tsoumakas, Ioannis Katakis. Multi-Label Classification: An Overview. International Journal of Data Warehousing & Mining, 3(3), 1-13, July–September 2007.
- Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank. Classifier Chains for Multi-label Classification. Machine Learning Journal. Springer. Vol. 85(3), (2011).
- Balasubramanian, Krishnakumar; Lebanon, Guy (2012). "The landmark selection method for multiple output prediction". ICML.
- Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, Ioannis Vlahavas Multi-label Classification of Music into emotions ISMIR 2008
- Zhang, M.L. and Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning
- Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, Sašo Džeroski. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition. Vol. 45(9), (2012).