Automatic image annotation

Automatic image annotation (also known as automatic image tagging) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

This method can be regarded as a type of multi-class image classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images. The first methods learned the correlations between image features and training annotations, then techniques were developed using machine translation to try and translate the textual vocabulary with the 'visual vocabulary', or clustered regions known as blobs. Work following these efforts have included classification approaches, relevance models and so on.

The advantages of automatic image annotation versus content-based image retrieval are that queries can be more naturally specified by the user [1]. CBIR generally (at present) requires users to search by image concepts such as color and texture, or finding example queries. Certain image features in example images may override the concept that the user is really focusing on. The traditional methods of image retrieval such as those used by libraries have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly-growing image databases in existence.

Some annotation engines are online, including the ALIPR.com real-time tagging engine developed by Penn State researchers, and Behold - an image search engine that indexes over 1 million images using automatically generated tags.

Some major work

Word co-occurrence model

Y Mori, H Takahashi, and R Oka (1999). "Image-to-word transformation based on dividing and vector quantizing images with words.". Proceedings of the International Workshop on Multimedia Intelligent Storage and Retrieval Management. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Annotation as machine translation

P Duygulu, K Barnard, N de Fretias, and D Forsyth (2002). "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary". Proceedings of the European Conference on Computer Vision. pp. 97–112. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Statistical models

J Li and J Z Wang (2006). "Real-time Computerized Annotation of Pictures". Proc. ACM Multimedia. pp. 911–920. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

J Z Wang and J Li (2002). "Learning-Based Linguistic Indexing of Pictures with 2-D MHMMs". Proc. ACM Multimedia. pp. 436–445. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Automatic linguistic indexing of pictures

J Li and J Z Wang (2003). "Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach". IEEE Trans. on Pattern Analysis and Machine Intelligence. pp. 1075–1088. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Hierarchical Aspect Cluster Model

K Barnard, D A Forsyth (2001). "Learning the Semantics of Words and Pictures". Proceedings of International Conference on Computer Vision. pp. 408–415. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Latent Dirichlet Allocation model

D Blei, A Ng, and M Jordan (2003). "Latent Dirichlet allocation" (PDF). Journal of Machine Learning Research. pp. 3:993-1022. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Texture similarity

R W Picard and T P Minka (1995). "Vision Texture for Annotation". Multimedia Systems. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Support Vector Machines

C Cusano, G Ciocca, and R Scettini (2004). "Image Annotation Using SVM". Proceedings of Internet Imaging IV. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Ensemble of Decision Trees and Random Subwindows

R Maree, P Geurts, J Piater, and L Wehenkel (2005). "Random Subwindows for Robust Image Classification". Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. pp. 1:34-30. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Maximum Entropy

J Jeon, R Manmatha (2004). "Using Maximum Entropy for Automatic Image Annotation" (PDF). Int'l Conf on Image and Video Retrieval (CIVR 2004). pp. 24–32. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Relevance models

J Jeon, V Lavrenko, and R Manmatha (2003). "Automatic image annotation and retrieval using cross-media relevance models" (PDF). Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 119–126. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Relevance models using continuous probability density functions

V Lavrenko, R Manmatha, and J Jeon (2003). "A model for learning the semantics of pictures" (PDF). Proceedings of the 16th Conference on Advances in Neural Information Processing Systems NIPS. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Coherent Language Model

R Jin, J Y Chai, L Si (2004). "Effective Automatic Image Annotation via A Coherent Language Model and Active Learning" (PDF). Proceedings of MM'04. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Inference networks

D Metzler and R Manmatha (2004). "An inference network approach to image retrieval" (PDF). Proceedings of the International Conference on Image and Video Retrieval. pp. 42–50. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Multiple Bernoulli distribution

S Feng, R Manmatha, and V Lavrenko (2004). "Multiple Bernoulli relevance models for image and video annotation" (PDF). IEEE Conference on Computer Vision and Pattern Recognition. pp. 1002–1009. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Multiple design alternatives

J Y Pan, H-J Yang, P Duygulu and C Faloutsos (2004). "Automatic Image Captioning" (PDF). Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME'04). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Natural scene annotation

J Fan, Y Gao, H Luo and G Xu (2004). "Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation". Proceedings of the 27th annual international conference on Research and development in information retrieval. pp. 361–368. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

Relevant low-level global filters

A Oliva and A Torralba (2001). "Modeling the shape of the scene: a holistic representation of the spatial envelope" (PDF). International Journal of Computer Vision. pp. 42:145-175. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

Global image features and nonparametric density estimation

A Yavlinsky, E Schofield and S Rüger (2005). "Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation" (PDF). Int'l Conf on Image and Video Retrieval (CIVR, Singapore, Jul 2005). {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

References

M Inoue (2004). "On the need for annotation-based image retrieval" (PDF). Workshop on Information Retrieval in Context. pp. 44–46. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)

External links

ALIPR.com - Real-time automatic tagging engine developed by Penn State researchers.
Behold Image Search - An image search engine that combines automatic annotation and content based image retrieval for over 1 million images from university websites.

Some major work

See also

References

External links