Multiple-instance learning

In machine learning, multiple-instance learning (MIL) is a variation on supervised learning. Instead of receiving a set of instances which are individually labeled, the learner receives a set of labeled bags, each containing many instances. In the simple case of multiple-instance binary classification, a bag may be labeled negative if all the instances in it are negative. On the other hand, a bag is labeled positive if there is at least one instance in it which is positive. From a collection of labeled bags, the learner tries to either (i) induce a concept that will label individual instances correctly or (ii) learn how to label bags without inducing the concept.

Take image classification for example in Amores (2013). Given an image, we want to know its target class based on its visual content. For instance, the target class might be "beach", where the image contains both "sand" and "water". In MIL terms, the image is described as a bag $X=\{X_{1},..,X_{N}\}$ , where each $X_{i}$ is the feature vector (called instance) extracted from the corresponding i-th region in the image and N is the total regions (instances) partitioning the image. The bag is labeled positive ("beach") if it contains both "sand" region instances and "water" region instances.

Multiple-instance learning was originally proposed under this name by Dietterich, Lathrop & Lozano-Pérez (1997), but earlier examples of similar research exist, for instance in the work on handwritten digit recognition by Keeler, Rumelhart & Leow (1990). Recent reviews of the MIL literature include Amores (2013), which provides an extensive review and comparative study of the different paradigms, and Foulds & Frank (2010), which provides a thorough review of the different assumptions used by different paradigms in the literature.

Examples of where MIL is applied are:

Molecule activity
Predicting binding sites of Calmodulin binding proteins ^[1]
Predicting function for alternatively spliced isoforms Li, Menon & et al. (2014),Eksi et al. (2013)
Image classification Maron & Ratan (1998)
Text or document categorization

Numerous researchers have worked on adapting classical classification techniques, such as support vector machines or boosting, to work within the context of multiple-instance learning.

References

^ Minhas, Fayyaz (2012). "Multiple instance learning of Calmodulin binding sites,". Bioinformatics. 28 (18): i416-i422. doi:10.1093/bioinformatics/bts416.

Dietterich, Thomas G.; Lathrop, Richard H.; Lozano-Pérez, Tomás (1997), "Solving the multiple instance problem with axis-parallel rectangles", Artificial Intelligence, 89 (1–2): 31–71, doi:10.1016/S0004-3702(96)00034-3.

Amores, Jaume (2013), "Multiple instance classification: Review, taxonomy and comparative study", Artificial Intelligence, 201: 81–105, doi:10.1016/j.artint.2013.06.003.

Foulds, James; Frank, Eibe (2010), "A Review of Multi-Instance Learning Assumptions", Knowledge Engineering Review, 25 (1): 1–25, doi:10.1017/S026988890999035X.

Keeler, James D.; Rumelhart, David E.; Leow, Wee-Kheng (1990), "Integrated segmentation and recognition of hand-printed numerals", Proceedings of the 1990 Conference on Advances in Neural Information Processing Systems (NIPS 3), pp. 557–563.

Li, H.D.; Menon, R.; et al. (2014), "The emerging era of genomic data integration for analyzing splice isoform function", Trends in Genetics, doi:10.1016/j.tig.2014.05.005, PMID 24951248, pii S0168-9525(14)00085-7 {{citation}}: Explicit use of et al. in: |last3= (help).

Eksi, R.; Li, H.D.; Menon, R.; et al. (2013), "Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data", PLoS Comput Biol: Nov, 9(11):e1003314, doi:10.1371/journal.pcbi.1003314, PMC 3820534, PMID 24244129 {{citation}}: Explicit use of et al. in: |last4= (help)CS1 maint: unflagged free DOI (link).

Maron, O.; Ratan, A.L. (1998), "Multiple-instance learning for natural scene classification", Proceedings of the Fifteenth International Conference on Machine Learning, pp. 341–349.

Ray, Soumya; Page, David (2001). Multiple instance regression (PDF). ICML..

[1] Minhas, Fayyaz (2012). "Multiple instance learning of Calmodulin binding sites,". Bioinformatics. 28 (18): i416-i422. doi:10.1093/bioinformatics/bts416.

[1]

See also

References