|This article reads like a news release, or is otherwise written in an overly promotional tone. (July 2013)|
|Developer(s)||IVONA Software Poland|
|Available in||Polish / English / Romanian / German / Castilian Spanish / American Spanish / French / Welsh / Italian / Icelandic / Brazilian Portuguese more coming soon|
IVONA is a multi-lingual speech synthesis system developed at Polish IT company IVONA Software. It offers a full text to speech system with various APIs. It was acquired by Amazon.com in January 2013, for its Kindle product range.
IVONA text-to-speech system was described at Blizzard Challenge 2006. and Blizzard Challenge 2007 (special version for Blizzard Challenge). It is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.
Unit selection synthesis
IVONA uses Unit Selection with Limited Time-scale Modification (USLTM) described in their Blizzard Challenge 2006 paper. Unit selection synthesis uses large databases of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual phones, syllables, morphemes, words, phrases, and sentences. The division into segments is done using a specially modified speech recognizer. An index of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency (pitch), duration, position in the syllable, and neighboring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection).
Unit selection provides the greatest naturalness, because it applies digital signal processing (DSP) to the recorded speech only at concatenation points. DSP often makes recorded speech sound less natural.
Generated speech quality
IVONA Text To Speech System received the highest Mean Opinion Score (MOS) at the scientific contest Blizzard Challenge 2007 in Bonn, Germany. The sentences read out by IVONA were evaluated by experts, a group of British and American students and volunteers recruited via the Internet. Average mean opinion score for IVONA was the highest (3.9 points) from all speech synthesizers. A real person’s recording scored 4.7.
Voices and languages
IVONA currently speaks seventeen different languages with over forty voices.
American English: Salli, Ivy, Kimberly, Kendra, Jennifer, Joey, Eric and Chipmunk Skippy
American Spanish: Penélope, Miguel
Australian English: Nicole
British English: Emma, Amy and Brian
Welsh English: Geraint, Gwyneth
Welsh: Geraint, Gwyneth
German: Marlene, Hans
French: Céline, Mathieu
Castilian Spanish: Conchita, Enrique
Icelandic: Dóra, Karl
Italian: Carla, Giorgio
Australian English: Nicole, Russell
Canadian French: Chantal
Dutch: Lotte, Ruben
Brazilian Portuguese: Vitória, Ricardo
Polish: Agnieszka, Maja, Ewa, Jacek and Jan
Danish: Naja, Mads
IVONA is compatible with Windows, Unix, Android, Tizen, iOS based systems.
- Speech synthesis
- Natural language processing
- Speech processing
- Speech recognition
- List of screen readers
- Amazon.com Announces Acquisition of IVONA Software.
- Lukasz Osowski & Michal Kaszczuk, IVO Blizzard 2006 Entry. Blizzard Challenge 2006 Workshop.
- Kaszczuk, Michal / Osowski, Lukasz: The IVO software Blizzard 2007 entry: improving Ivona speech synthesis system
- Alan W. Black, Perfect synthesis for all of the people all of the time. IEEE TTS Workshop 2002.
- Clark, Robert A. J. / Podsiadlo, Monika / Fraser, Mark / Mayo, Catherine / King, Simon: "Statistical analysis of the Blizzard Challenge 2007 listening test results" (IVONA is identified as a system with letter P)
- Christina L. Bennett and Alan W Black, Blizzard Challenge 2006: Results (IVONA is identified as a system with letter K)
- Voices list
- IVONA TTS on-line.
- See IVONA TTS in action.
- Expressivo Text Reader application voiced by IVONA TTS.
- Free web service say.expressivo.com - send and publish prompts spoken by IVONA TTS voices.