In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple target labels must be assigned to each instance. Multi-label classification should not be confused with multiclass classification, which is the problem of categorizing instances into more than two classes. Formally, multi-label learning can be phrased as the problem of finding a model that maps inputs x to vectors y, rather than scalar outputs as in the ordinary classification problem.
There are two main methods for tackling the multi-label classification problem: problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems. Algorithm adaptation methods adapt the algorithms to directly perform multi-label classification.
Problem transformation 
Several problem transformation methods exist for multi-label classification; the baseline approach amounts to training one classifier per label, similar to the one-vs.-all (OvA, also one-vs.-rest, OvR) method for multiclass classification. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective binary classifiers predict a positive result. (This method has also been called the "binary relevance" method.)
Various other transformations exist: the label combinations (LC) transformation, creates one binary classifier for every possible label combination. Other transformation methods include RAkEL and classifier chains. Various problem transformation methods have been developed such as Ml-kNN, a variant of the k-nearest neighbors lazy classifiers. A more detailed description of the most well known methods for multi-label classification and an extensive empirical evaluation can be found here.
Evaluation metrics for multi-label classification are inherently different from those used in multi-class (or binary) classification, due to the inherent differences of the classification problem. The following metrics are typically used:
- Hamming loss: the percentage of the wrong labels to the total number of labels. This is a loss function, so the optimal value is zero.
- Label-based accuracy
- Exact match: is the most strict metric, indicating the percentage of samples that have all their labels classified correctly.
Implementations and datasets 
A list of commonly used multi-label data-sets is available at the Mulan website.
- Grigorios Tsoumakas, Ioannis Katakis. Multi-Label Classification: An Overview. International Journal of Data Warehousing & Mining, 3(3), 1-13, July–September 2007.
- Balasubramanian, Krishnakumar; Lebanon, Guy (2012). "The landmark selection method for multiple output prediction". ICML.
- Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank. Classifier Chains for Multi-label Classification. Machine Learning Journal. Springer. Vol. 85(3), (2011).
- Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, Ioannis Vlahavas Multi-label Classification of Music into emotions ISMIR 2008
- Zhang, M.L. and Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label learning
- Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj, Sašo Džeroski. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition. Vol. 45(9), (2012).