Zero-shot learning

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Zero-shot learning (ZSL) is a problem setup in machine learning, where at test time, a learner observes samples from classes that were not observed during training, and needs to predict the category they belong to.  This problem is widely studied in computer vision, natural language processing and machine perception (review..[1]). Unlike standard generalization in machine learning, where classifiers are expected to correctly classify new samples to classes they have already observed during training, in ZSL, no samples from the classes have been given during training the classifier. It can therefore be viewed as an extreme case of domain adaptation.

Prerequisite information for zero-shot classes[edit]

Naturally, some form of side information has to be given about these zero-shot classes, and this type of information can be of several types. 

  • Learning with attributes: classes are accompanied by pre-defined structured description. For example, for bird descriptions, this could include "red head", "long beak" [2][3]. These attributes are often organized in a structured compositional way, and taking that structure into account improves learning [4]
  • Learning from textual description. Here classes are accompanied by free-text natural-language description. This could include for example a wikipedia description of the class[5]  [6]
  • Class-class similarity. Here, classes are embedded in a continuous space. a zero-shot classifier can predict that a samples correspond to some position in that space, and the nearest embedded class is used as a predicted class, even if no such samples were observed during training. [7]

Generalized zero-shot learning[edit]

The above ZSL learning setup assumes that at test time, only zero-shot samples are given, namely, samples from new unseen classes. In generalized zero-shot learning, samples from both new and known classes, may appear at test time. This poses new challenges for classifiers at test time, because it is very challenging to estimate if a given sample is new or known. Few approaches to handle this include: 

  • A gating approach. Here an additional module is first trained to decide if a given sample comes from a new class or from an old one. The gater could output a hard decision [8] , but emmiting a soft probabilistic decision further improves the accuracy of this line of approaches[9]
  • Generative approaches. Here, a generative model is trained to generate feature representation of the unseen classes. Then a standard classifier is trained given samples from all classes, seen and unseen. [10]


  1. ^ Xian, Yongqin; Schiele, Bernt; Akata, Zeynep (2017). "Zero-shot learning-the good, the bad and the ugly". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 4582–4591. arXiv:1703.04394. Bibcode:2017arXiv170304394X.
  2. ^ Lampert, C.H. (2009). "Learning to detect unseen object classes by between-class attribute transfer". IEEE Conference on Computer Vision and Pattern Recognition: 951–958.
  3. ^ Romera-Paredes, Bernardino; Torr, Phillip (2015). "An embarrassingly simple approach to zero-shot learning". International Conference on Machine Learning: 2152–2161.
  4. ^ Atzmon, Yuval; Chechik, Gal (2018). "Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning" (PDF). Uncertainty in Artificial Intelligence. arXiv:1806.02664. Bibcode:2018arXiv180602664A.
  5. ^ Hu, R Lily; Xiong, Caiming; Socher, Richard (2018). "Zero-Shot Image Classification Guided by Natural Language Descriptions of Classes: A Meta-Learning Approach". NeurIPS.
  6. ^ Srivastava, Shashank; Labutov, Igor; Mitchelle, Tom (2018). "Zero-shot Learning of Classifiers from Natural Language Quantification". ACL. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 306–316. doi:10.18653/v1/P18-1029.
  7. ^ Frome, Andrea; et, al (2013). "Devise: A deep visual-semantic embedding model". Advances in Neural Information Processing Systems: 2121–2129.
  8. ^ Socher, R; Ganjoo, M; Manning, C.D.; Ng, A. (2013). "Zero-shot learning through cross-modal transfer". Neural Information Processing Systems. arXiv:1301.3666. Bibcode:2013arXiv1301.3666S.
  9. ^ Atzmon, Yuval (2019). "Adaptive Confidence Smoothing for Generalized Zero-Shot Learning". The IEEE Conference on Computer Vision and Pattern Recognition: 11671–11680. arXiv:1812.09903. Bibcode:2018arXiv181209903A.
  10. ^ Felix, R; et, al (2018). "Multi-modal cycle-consistent generalized zero-shot learning". Proceedings of the European Conference on Computer Vision: 21–37. arXiv:1808.00136. Bibcode:2018arXiv180800136F.