|Initial release||February 2006|
|Stable release||1.48.04 / April 6, 2014|
Mac OS X
|License||GNU GPL v3+|
eSpeak is a compact open source software speech synthesizer for Linux, Windows, and other platforms. It uses a formant synthesis method, providing many languages in a small size. Much of the programming for eSpeak's languages was based on information found on Wikipedia, with some subsequent feedback from native speakers. Projects using eSpeak include NVDA, Ubuntu and OLPC, and it has also been used by Google Translate.
eSpeak is derived from the "Speak" speech synthesizer for British English for Acorn RISC OS computers which was originally written in 1995 by Jonathan Duddington.
A rewritten version for Linux appeared in February 2006 and a Windows SAPI 5 version in January 2007. Subsequent development has added and improved support for additional languages.
The quality of the language voices varies greatly. Some have had more work or feedback from native speakers than others. Most of the people who have helped to improve the various languages are blind users of text-to-speech.
eSpeak provides two methods of synthesis: the original eSpeak synthesizer and a Klatt synthesizer. In addition, eSpeak can be used as a front-end, providing text-to-phoneme translation and prosody, to MBROLA diphone voices.
The eSpeak and Klatt synthesizers use different types of formant synthesis.
The eSpeak synthesizer creates voiced speech sounds such as vowels and sonorant consonants by adding together sine waves to make the formant peaks. Unvoiced consonants such as /s/ are made by playing recorded sounds. Voiced consonants such as /z/ are made by mixing a synthesized voiced sound with a recorded unvoiced sound.
The Klatt synthesizer mostly uses the same formant data as the eSpeak synthesizer. It produces voiced sounds by starting with a waveform which is rich in harmonics (simulating the vibration of the vocal cords) and then applying digital filters in order to produce speech sounds.
eSpeak can be used as a command-line program, or as a shared library.
It supports Speech Synthesis Markup Language (SSML).
Language voices are identified by the language's ISO 639-1 code. They can be modified by "voice variants". These are text files which can change characteristics such as pitch range, add effects such as echo, whisper and croaky voice, or make systematic adjustments to formant frequencies to change the sound of the voice. For example, "af" is the Afrikaans voice. "af+f2" is the Afrikaans voice modified with the "f2" voice variant which changes the formants and the pitch range to give a female sound.
eSpeak uses an ASCII representation of phoneme names which is loosely based on the Kirshenbaum system.
Phonetic representations can be included within text input by including them within double square-brackets. For example: espeak -v en "Hello [[w3:ld]]" will say "Hello world" in English.
eSpeak does text-to-speech synthesis for the following languages, some better than others.
Afrikaans, Albanian, Aragonese, Armenian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Kannada, Kurdish, Latvian, Lithuanian, Lojban, Macedonian, Malaysian, Malayalam, Mandarin, Nepalese, Norwegian, Persian (Farsi), Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh.
- http://espeak.sourceforge.net/download.html has Mac OS X and RISC OS binaries, and the source contains notes about compiling on DOS, generic Unix and Windows Mobile
- Dennis H. Klatt (1979). "Software for a cascade/parallel formant synthesizer" (PDF). J. Acoustical Society of America, 67(3) March 1980.
- Official website
- SourceForge.net project page
- Tombuntu magazine article about eSpeak
- GUI for eSpeak
- Ruby API for eSpeak
- Lua API for eSpeak