|Initial release||June 2017, 19|
|Available in||Multilingual (List of languages)|
|License||Creative Commons CC0|
Common Voice is a crowdsourcing project started by Mozilla to create a free database for speech recognition software. The project is supported by volunteers who record sample sentences with a microphone and review recordings of other users. The transcribed sentences will be collected in a voice database available under the public domain license CC0. This license ensures that developers can use the database for voice-to-text applications without restrictions or costs.
The English Common Voice database is the second largest freely accessible voice database after LibriSpeech. By the time the first data were published on 29 November 2017, more than 20,000 users worldwide had registered 400,000 validated sentences, with a total length of 500 hours.
In February 2019, the first batch of languages was released for use. This included 18 languages: English, French, German and Mandarin Chinese, but also less prevalent languages as Welsh and Kabyle. In total, this included almost 1,400 hours of recorded voice data from more than 42,000 contributors.
- "Announcing the Initial Release of Mozilla's Open Source Speech Recognition Model and Voice Dataset". blog mozilla.org. November 29, 2017.
- "Mozilla updates Common Voice dataset with 1,400 hours of speech across 18 languages". VentureBeat. February 28, 2019.