Keyword spotting

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Keyword spotting (or more simply, word spotting) is a problem that was historically first defined in the context of speech processing [1][2]. In speech processing, keyword spotting deals with the identification of keywords in utterances.

Keyword spotting is also defined as a separate, but related, problem in the context of document image processing [1]. In document image processing, keyword spotting is the problem of finding all instances of a query word that exist in a scanned document image, without fully recognizing it.

In speech processing[edit]

The first works in keyword spotting appeared in the late 1980s [2].

There are several types of keyword spotting:

  • Keyword spotting in unconstrained speech
  • Keyword spotting in isolated word recognition

Keyword spotting in unconstrained speech appears when keywords may not be separated from other words, and no grammar is enforced on the sentence containing them. Some algorithms used for this task are:

Keyword spotting in isolated word recognition appears when the keywords are separated from other texts by silences. The main technique that applied in such problems is dynamic time warping.

In document image processing[edit]

Keyword spotting in document image processing can be seen as an instance of the more generic problem of content-based image retrieval (CBIR). Given a query, the goal is to retrieve the most relevant instances of words in a collection of scanned documents [1]. The query may be a text string (Query-by-string keyword spotting) or a word image (Query-by-example keyword spotting).


  1. ^ a b c Giotis, A.P; Sfikas, G.; Gatos, B.; Nikou, C. (2017). "A survey of document image word spotting techniques". Pattern Recognition. 68: 310–332. 
  2. ^ a b Rohlicek, J.; Russell, W.; Roukos, S.; Gish, H. (1989). "Continuous hidden Markov modeling for speaker-independent word spotting". Proceedings of the 14th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1: 627–630.