Automatic image annotation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

This method can be regarded as a type of multi-class image classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images. The first methods learned the correlations between image features and training annotations, then techniques were developed using machine translation to try to translate the textual vocabulary with the 'visual vocabulary', or clustered regions known as blobs. Work following these efforts have included classification approaches, relevance models and so on.

The advantages of automatic image annotation versus content-based image retrieval (CBIR) are that queries can be more naturally specified by the user [1]. CBIR generally (at present) requires users to search by image concepts such as color and texture, or finding example queries. Certain image features in example images may override the concept that the user is really focusing on. The traditional methods of image retrieval such as those used by libraries have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly growing image databases in existence.

Some annotation engines are online, including the real-time tagging engine developed by Pennsylvania State University researchers, and Behold.

Some major work[edit]

  • Word co-occurrence model
Y Mori, H Takahashi, and R Oka (1999). "Proceedings of the International Workshop on Multimedia Intelligent Storage and Retrieval Management".  |chapter= ignored (help)
  • Annotation as machine translation
P Duygulu, K Barnard, N de Fretias, and D Forsyth (2002). "Proceedings of the European Conference on Computer Vision". pp. 97–112.  |chapter= ignored (help)
  • Statistical models
J Li and J Z Wang (2006). "Proc. ACM Multimedia". pp. 911–920.  |chapter= ignored (help)
J Z Wang and J Li (2002). "Proc. ACM Multimedia". pp. 436–445.  |chapter= ignored (help)
  • Automatic linguistic indexing of pictures
J Li and J Z Wang (2008). "IEEE Trans. on Pattern Analysis and Machine Intelligence".  |chapter= ignored (help)
J Li and J Z Wang (2003). "IEEE Trans. on Pattern Analysis and Machine Intelligence". pp. 1075–1088.  |chapter= ignored (help)
  • Hierarchical Aspect Cluster Model
K Barnard, D A Forsyth (2001). "Proceedings of International Conference on Computer Vision". pp. 408–415.  |chapter= ignored (help)
  • Latent Dirichlet Allocation model
D Blei, A Ng, and M Jordan (2003). "Journal of Machine Learning Research". pp. 3:993–1022.  |chapter= ignored (help)
G Carneiro, A B Chan, P Moreno, and N Vasconcelos (2006). "IEEE Trans. on Pattern Analysis and Machine Intelligence". pp. 394–410.  |chapter= ignored (help)
  • Texture similarity
R W Picard and T P Minka (1995). "Multimedia Systems".  |chapter= ignored (help)
  • Support Vector Machines
C Cusano, G Ciocca, and R Scettini (2004). "Proceedings of Internet Imaging IV".  |chapter= ignored (help)
  • Ensemble of Decision Trees and Random Subwindows
R Maree, P Geurts, J Piater, and L Wehenkel (2005). "Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition". pp. 1:34–30.  |chapter= ignored (help)
  • Maximum Entropy
J Jeon, R Manmatha (2004). "Int'l Conf on Image and Video Retrieval (CIVR 2004)". pp. 24–32.  |chapter= ignored (help)
  • Relevance models
J Jeon, V Lavrenko, and R Manmatha (2003). "Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval". pp. 119–126.  |chapter= ignored (help)
  • Relevance models using continuous probability density functions
V Lavrenko, R Manmatha, and J Jeon (2003). "Proceedings of the 16th Conference on Advances in Neural Information Processing Systems NIPS".  |chapter= ignored (help)
  • Coherent Language Model
R Jin, J Y Chai, L Si (2004). "Proceedings of MM'04".  |chapter= ignored (help)
  • Inference networks
D Metzler and R Manmatha (2004). "Proceedings of the International Conference on Image and Video Retrieval". pp. 42–50.  |chapter= ignored (help)
  • Multiple Bernoulli distribution
S Feng, R Manmatha, and V Lavrenko (2004). "IEEE Conference on Computer Vision and Pattern Recognition". pp. 1002–1009.  |chapter= ignored (help)
  • Multiple design alternatives
J Y Pan, H-J Yang, P Duygulu and C Faloutsos (2004). "Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME'04)".  |chapter= ignored (help)
  • Natural scene annotation
J Fan, Y Gao, H Luo and G Xu (2004). "Proceedings of the 27th annual international conference on Research and development in information retrieval". pp. 361–368.  |chapter= ignored (help)
  • Relevant low-level global filters
A Oliva and A Torralba (2001). "International Journal of Computer Vision". pp. 42:145–175.  |chapter= ignored (help)
  • Global image features and nonparametric density estimation
A Yavlinsky, E Schofield and S Rüger (2005). "Int'l Conf on Image and Video Retrieval (CIVR, Singapore, Jul 2005)".  |chapter= ignored (help)
  • Video semantics
N Vasconcelos and A Lippman (2001). "IEEE Transactions on Image Processing". pp. 1–17.  |chapter= ignored (help)
Ilaria Bartolini, Marco Patella, and Corrado Romani (2010). "3rd ACM International Multimedia Workshop on Automated Information Extraction in Media Production (AIEMPro10)".  |chapter= ignored (help)
  • Image Annotation Refinement
Yohan Jin, Latifur Khan, Lei Wang, and Mamoun Awad (2005). "13th Annual ACM International Conference on Multimedia (MM 05)". pp. 706–715.  |chapter= ignored (help)
Changhu Wang, Feng Jing, Lei Zhang, and Hong-Jiang Zhang (2006). "14th Annual ACM International Conference on Multimedia (MM 06)".  |chapter= ignored (help)
Changhu Wang, Feng Jing, Lei Zhang, and Hong-Jiang Zhang (2007). "IEEE Conference on Computer Vision and Pattern Recognition (CVPR 07)".  |chapter= ignored (help)
Ilaria Bartolini and Paolo Ciaccia (2007). "Springer Adaptive Multimedia Retrieval".  |chapter= ignored (help)
Ilaria Bartolini and Paolo Ciaccia (2010). "2nd ACM International Workshop on Keyword Search on Structured Data (KEYS 2010)".  |chapter= ignored (help)
  • Automatic Image Annotation by Ensemble of Visual Descriptors
Emre Akbas and Fatos Y. Vural (2007). "Intl. Conf. on Computer Vision (CVPR) 2007, Workshop on Semantic Learning Applications in Multimedia".  |chapter= ignored (help)
  • A New Baseline for Image Annotation
Ameesh Makadia and Vladimir Pavlovic and Sanjiv Kumar (2008). "European Conference on Computer Vision (ECCV)".  |chapter= ignored (help)
  • Simultaneous Image Classification and Annotation
Chong Wang and David Blei and Li Fei-Fei (2009). "Conf. on Computer Vision and Pattern Recognition (CVPR)".  |chapter= ignored (help)
  • TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation
Matthieu Guillaumin and Thomas Mensink and Jakob Verbeek and Cordelia Schmid (2009). "Intl. Conf. on Computer Vision (ICCV)".  |chapter= ignored (help)
  • Image Annotation Using Metric Learning in Semantic Neighbourhoods
Yashaswi Verma and C. V. Jawahar (2012). "European Conference on Computer Vision (ECCV)".  |chapter= ignored (help)

See also[edit]


External links[edit]