Common Voice

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Common Voice
Common Voice Banner2.png
Developer(s)Mozilla Foundation
Initial releaseJune 2017, 19; 2 years ago (19-06-2017)
Available inMultilingual (List of languages)
LicenseCreative Commons CC0

Common Voice is a crowdsourcing project started by Mozilla to create a free database for speech recognition software. The project is supported by volunteers who record sample sentences with a microphone and review recordings of other users. The transcribed sentences will be collected in a voice database available under the public domain license CC0. This license ensures that developers can use the database for voice-to-text applications without restrictions or costs.

Common Voice appeared as a response to the language assistants of large companies such as Amazon Echo, Siri or Google Assistant.

Voice database[edit]

The English Common Voice database is the second largest freely accessible voice database after LibriSpeech. By the time the first data were published on 29 November 2017, more than 20,000 users worldwide had registered 400,000 validated sentences, with a total length of 500 hours.[1]

In February 2019, the first batch of languages was released for use. This included 18 languages: English, French, German and Mandarin Chinese, but also less prevalent languages as Welsh and Kabyle. In total, this included almost 1,400 hours of recorded voice data from more than 42,000 contributors.[2]


  1. ^ "Announcing the Initial Release of Mozilla's Open Source Speech Recognition Model and Voice Dataset". blog November 29, 2017.
  2. ^ "Mozilla updates Common Voice dataset with 1,400 hours of speech across 18 languages". VentureBeat. February 28, 2019.