Voice search

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Voice search, also called voice-enabled, allows the user to use a voice command to search the Internet, a website, or an app.

In a broader definition, voice search includes open-domain keyword query on any information on the Internet, for example in Google Voice Search, Cortana, Siri and Amazon Echo.

Voice search is often interactive, involving several rounds of interaction that allows a system to ask for clarification. Voice search is a type of dialog system.


Voice searching is a method of search which allows users to search using spoken voice commands rather than typing. The search can be done on any device with a voice input. Three common methods to activate voice search:

  1. Click on the voice command icon
  2. Call out the name of the virtual assistant
  3. Click on the home button or gesture on interface
Activate the virtual assistant[edit]

Apple: Hey, Siri

Google: OK, Google

Amazon: Hey, Alexa

Microsoft: Hey, Cortana

Supported languages[edit]

Language is the most essential factor for a system to understand, and provide the most accurate results of what the user search. This covers across languages, dialects, and accents, as users want a voice assistant that both understands them and speaks to them understandably.

For example, the [https://cloud.google.com/speech-to-text/docs/languages Google Cloud STT API can recognize speech in up to 119 languages.

How it works[edit]

The search method is same as the performing normal search on the website, the difference is the search is conducted using speech, rather than text. The mechanism includes automatic speech recognition (ASR) for input. It can also include text-to-speech (TTS) for output. Users might sometimes be required to activate the virtual assistant before performing the search. Then, the search system will detect the language spoken by the user, then detect the keywords and context of the sentence. Next, the device will return results depending on its output. A device with a screen might display the results, while a device without a screen will speak them back to the searcher.


  • Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, Alex Acero, An Introduction to Voice Search, IEEE Signal Processing Magazine (Special Issue on Spoken Language Technology), Institute of Electrical and Electronics Engineers, Inc., May 2008
  • J. Sherwani, Dong Yu, Tim Paek, Mary Czerwinski, Yun-Cheng Ju, and Alex Acero. 'VoicePedia: Towards Speech-Based Access to Unstructured Information', Proceedings of the 8th Annual Conference of the International Communication Association (Interspeech 2007). Antwerp, Belgium, August, 2007