Jump to content

ESpeak: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎History: Typo error
Tags: Mobile edit Mobile web edit Disambiguation links added
No edit summary
Tags: Reverted Mobile edit Mobile web edit
Line 285: Line 285:
#[[Novial]]
#[[Novial]]
#[[Nogai language|Nogai]]
#[[Nogai language|Nogai]]
#[[Nottoway language|Nottoway]]
#[[Old English]]
#[[Old English]]
#[[Odia language|Odia]]
#[[Odia language|Odia]]

Revision as of 17:28, 19 September 2022

eSpeakNG
Original author(s)Jonathan Duddington
Developer(s)Reece Dunn
Initial releaseFebruary 2006; 18 years ago (2006-02)
Stable release
1.51 / 2 April 2022; 2 years ago (2022-04-02)
Repositorygithub.com/espeak-ng/espeak-ng/
Written inC
Operating systemLinux
Windows
macOS
FreeBSD
TypeSpeech synthesizer
LicenseGPLv3
Websitegithub.com/espeak-ng/espeak-ng/

eSpeakNG is a free and open-source, cross-platform, compact, software speech synthesizer. It uses a formant synthesis method, providing many languages in a relatively small file size. Much of the programming for eSpeakNG's language support is implemented using rule files with feedback from native speakers.

Because of its small size and many languages, it is included in NVDA[1] open source screen reader for Windows, as well as Android,[2] Ubuntu[3] and other Linux distributions. Its predecessor eSpeak was recommended by Microsoft in 2016[4] and was used by Google Translate for 27 languages in 2010;[5] 17 of these were subsequently replaced by proprietary voices.[6]

The quality of the language voices varies greatly. In eSpeakNG's predecessor eSpeak, the initial versions of some languages were based on information found on Wikipedia.[7] Some languages have had more work or feedback from native speakers than others. Most of the people who have helped to improve the various languages are blind users of text-to-speech.

History

Isotype for ESpeak.

In 1995, Jonathan Duddington released the Speak speech synthesizer for RISC OS computers supporting British English.[8] On 17 February 2006, Speak 1.05 was released under the GPLv2 license, initially for Linux, with a Windows SAPI 5 version added in January 2007.[9] Development on Speak continued until version 1.14, when it was renamed to eSpeak.

Development of eSpeak continued from 1.16 (there was not a 1.15 release)[9] with the addition of an eSpeakEdit program for editing and building the eSpeak voice data. These were only available as separate source and binary downloads up to eSpeak 1.24. The 1.24.02 version of eSpeak was the first version of eSpeak to be version controlled using subversion,[10] with separate source and binary downloads made available on SourceForge.[9] From eSpeak 1.27, eSpeak was updated to use the GPLv3 license.[11] The last official eSpeak release was 1.48.04 for Windows and Linux, 1.47.06 for RISC OS and 1.45.04 for macOS.[12] The last development release of eSpeak was 1.48.15 on 16 April 2015.[13]

eSpeak uses the Usenet scheme to represent phonemes with ASCII characters.[14]

eSpeak NG

On 25 June 2010,[15] Reece Dunn started a fork of eSpeak on GitHub using the 1.43.46 release. This started off as an effort to make it easier to build eSpeak on Linux and other POSIX platforms.

On 4 October 2015 (6 months after the 1.48.15 release of eSpeak), this fork started diverging more significantly from the original eSpeak.[16][17]

On 8 December 2015, there were discussions on the eSpeak mailing list about the lack of activity from Jonathan Duddington over the previous 8 months from the last eSpeak development release. This evolved into discussions of continuing development of eSpeak in Jonathan's absence.[18][19] The result of this was the creation of the espeak-ng (Next Generation) fork, using the GitHub version of eSpeak as the basis for future development.

On 11 December 2015, the espeak-ng fork was started.[20] The first release of espeak-ng was 1.49.0 on 10 September 2016,[21] containing significant code cleanup, bug fixes, and language updates.

Features

eSpeakNG can be used as a command-line program, or as a shared library.

It supports Speech Synthesis Markup Language (SSML).

Language voices are identified by the language's ISO 639-1 code. They can be modified by "voice variants". These are text files which can change characteristics such as pitch range, add effects such as echo, whisper and croaky voice, or make systematic adjustments to formant frequencies to change the sound of the voice. For example, "af" is the Afrikaans voice. "af+f2" is the Afrikaans voice modified with the "f2" voice variant which changes the formants and the pitch range to give a female sound.

eSpeakNG uses an ASCII representation of phoneme names which is loosely based on the Usenet system.

Phonetic representations can be included within text input by including them within double square-brackets. For example: espeak-ng -v en "Hello [[w3:ld]]" will say Hello world in English.

Synthesis method

ESpeakNG intro by eSpeakNG in English

eSpeakNG can be used as text-to-speech translator in different ways, depending on which text-to-speech translation step user want to use.

1. step — text to phoneme translation

There are many languages (notably English) which don't have straightforward one-to-one rules between writing and pronunciation; therefore, the first step in text-to-speech generation has to be text-to-phoneme translation.

  1. input text is translated into pronunciation phonemes (e.g. input text xerox is translated into zi@r0ks for pronunciation).
  2. pronunciation phonemes are synthesized into sound e.g., zi@r0ks is voiced as zi@r0ks in monotone way

To add intonation for speech i.e. prosody data are necessary (e.g. stress of syllable, falling or rising pitch of basic frequency, pause, etc.) and other information, which allows to synthesize more human, non-monotonous speech. E.g. in eSpeakNG format stressed syllable is added using apostrophe: z'i@r0ks which provides more natural speech: z'i@r0ks with intonation

For comparison two samples with and without prosody data:

  1. [[DIs Iz m0noUntoUn spi:tS]] is spelled in monotone way
  2. [[DIs Iz 'Int@n,eItI2d sp'i:tS]] is spelled intonated way

If eSpeakNG is used for generation of prosody data only, then prosody data can be used as input for MBROLA diphone voices.

2. step — sound synthesis from prosody data

The eSpeakNG provides two different types of formant speech synthesis using its two different approaches. With its own eSpeakNG synthesizer and a Klatt synthesizer:[22]

  1. The eSpeakNG synthesizer creates voiced speech sounds such as vowels and sonorant consonants by additive synthesis adding together sine waves to make the total sound. Unvoiced consonants e.g. /s/ are made by playing recorded sounds,[23] because they are rich in harmonics, which makes additive synthesis less effective. Voiced consonants such as /z/ are made by mixing a synthesized voiced sound with a recorded sample of unvoiced sound.
  2. The Klatt synthesizer mostly uses the same formant data as the eSpeakNG synthesizer. But, it also produces sounds by subtractive synthesis by starting with generated noise, which is rich in harmonics, and then applying digital filters and enveloping to filter out necessary frequency spectrum and sound envelope for particular consonant (s, t, k) or sonorant (l, m, n) sound.

For the MBROLA voices, eSpeakNG converts the text to phonemes and associated pitch contours. It passes this to the MBROLA program using the PHO file format, capturing the audio created in output by MBROLA. That audio is then handled by eSpeakNG.

Languages

eSpeakNG performs text-to-speech synthesis for the following languages:[24][25]

  1. Abaza
  2. Abenaki
  3. Achinese
  4. Adyghe
  5. Afar
  6. Afrikaans[26]
  7. Albanian[27]
  8. Amharic
  9. Apache
  10. Arabela
  11. Ancient Greek
  12. Arabic1
  13. Aragonese[28]
  14. Arapaho
  15. Armenian (Eastern Armenian)
  16. Armenian (Western Armenian)
  17. Aromanian
  18. Assamese
  19. Assiniboine
  20. Avaric
  21. Awadhi
  22. Aymara
  23. Azerbaijani
  24. Bashkir
  25. Basque
  26. Basic English
  27. Belarusian
  28. Bengali
  29. Bhojpuri
  30. Bicolano
  31. Bodo
  32. Bishnupriya Manipuri
  33. Bosnian
  34. Bulgarian[28]
  35. Breton
  36. Burmese
  37. Caddo
  38. Cahuilla
  39. Cantonese[28]
  40. Carrier
  41. Catalan[28]
  42. Catawba
  43. Cayuga
  44. Cebuano
  45. Chamorro
  46. Chechen
  47. Cherokee
  48. Cheyenne
  49. Chhattisgarhi
  50. Chichewa
  51. Chickasaw
  52. Chinese (Mandarin)
  53. Chipewyan
  54. Chippewa
  55. Chitonga
  56. Chittagonian
  57. Choctaw
  58. Conestoga
  59. Corsican
  60. Croatian[28]
  61. Crow
  62. Czech
  63. Chuvash
  64. Church Slavonic
  65. Crimean Tatar
  66. Dakota
  67. Danish[28]
  68. Dari
  69. Divehi
  70. Dogri
  71. Dogrib
  72. Dutch[28]
  73. Dzongkha
  74. Edo
  75. English (American)[28]
  76. English (British)
  77. English (Caribbean)
  78. English (Lancastrian)
  79. English (Received Pronunciation)
  80. English (Scottish)
  81. English (West Midlands)
  82. Esperanto[28]
  83. Estonian[28]
  84. Ewe
  85. Eyak
  86. Finnish[28]
  87. Filipino
  88. Fon
  89. Fox
  90. French (Belgian)[28]
  91. French (Canada)
  92. French (France)
  93. French (Swiss)
  94. Frisian
  95. Gagauz
  96. Galician
  97. Garhwali
  98. Garifuna
  99. Garo
  100. Georgian[28]
  101. German[28]
  102. Greek (Modern)[28]
  103. Greenlandic
  104. Guarani
  105. Gujarati
  106. Gwichin
  107. Haida
  108. Haisla
  109. Hakka Chinese3
  110. Haitian Creole
  111. Hän
  112. Haryanvi
  113. Hausa
  114. Hawaiian
  115. Hebrew
  116. Hidatsa
  117. High Valyrian
  118. Hiligaynon
  119. Hindi[28]
  120. Hmong
  121. Ho-Chunk
  122. Hopi
  123. Hungarian[28]
  124. Hunsrik
  125. Iban
  126. Ibibio
  127. Icelandic[28]
  128. Igbo
  129. Iloko
  130. Indonesian[28]
  131. Ido
  132. Interlingua
  133. Interlingue
  134. Irish[28]
  135. Italian[28]
  136. Japanese4[29]
  137. Javanese
  138. Judaeo-Spanish
  139. Kannada[28]
  140. Kansa
  141. Kashmiri
  142. Kazakh
  143. Khakas
  144. Khmer
  145. Klingon
  146. Kʼicheʼ
  147. Kirundi
  148. Kikuyu
  149. Kinyarwanda
  150. Konkani[30]
  151. Korean
  152. Krio
  153. Kumyk
  154. Kurdish[28]
  155. Kyrgyz
  156. Quechua
  157. Ladakhi
  158. Lakota
  159. Lao
  160. Latin
  161. Latgalian
  162. Latvian[28]
  163. Lang Belta
  164. Lingua Franca Nova
  165. Lepcha
  166. Lezgi
  167. Limbu
  168. Limburgish
  169. Lingala
  170. Lithuanian
  171. Lojban[28]
  172. Luganda
  173. Luxembourgish
  174. Macedonian
  175. Madurese
  176. Magahi
  177. Maithili
  178. Makassarese
  179. Malagasy
  180. Malay[28]
  181. Malayalam[28]
  182. Maltese
  183. Mandan
  184. Manipuri
  185. Māori
  186. Marathi[28]
  187. Mohawk
  188. Moldovan
  189. Mon
  190. Mongolian
  191. Nahuatl (Classical)
  192. Navajo
  193. Nepali[28]
  194. Norwegian (Bokmål)[28]
  195. Northern Sotho
  196. Novial
  197. Nogai
  198. Nottoway
  199. Old English
  200. Odia
  201. Omaha-Ponca
  202. Oneida
  203. Onondaga
  204. Oromo
  205. Occtian
  206. Papiamento
  207. Palauan
  208. Pashto
  209. Pawnee
  210. Persian[28]
  211. Persian (Latin alphabet)2
  212. Polish[28]
  213. Portuguese (Brazilian)[28]
  214. Portuguese (Portugal)
  215. Punjabi[31]
  216. Pyash (a constructed language)
  217. Quapaw
  218. Romanian[28]
  219. Raramuri
  220. Russian[28]
  221. Russian (Latvia)
  222. Sadri
  223. Salar
  224. Samoan
  225. Sanskrit
  226. Santali
  227. Scottish Gaelic
  228. Seneca
  229. Serbian[28]
  230. Shan (Tai Yai)
  231. Sharda
  232. Sesotho
  233. Shipibo
  234. Shona
  235. Sindhi
  236. Sinhala
  237. Slovak[28]
  238. Slovenian
  239. Somali
  240. Spanish (Spain)[28]
  241. Spanish (Latin American)
  242. Spanish (United States)
  243. Stoney
  244. Sundanese
  245. Swahili[26]
  246. Swedish[28]
  247. Sylheti
  248. Tajik
  249. Tamil[28]
  250. Tatar
  251. Tetum
  252. Telugu
  253. Tibetan
  254. Tswana
  255. Thai
  256. Tuvan
  257. Turkmen
  258. Turkish[28]
  259. Tatar
  260. Uyghur
  261. Ukrainian
  262. Urarina
  263. Urdu
  264. Uzbek
  265. Vietnamese (Central Vietnamese)[28]
  266. Vietnamese (Northern Vietnamese)
  267. Vietnamese (Southern Vietnamese)
  268. Volapük
  269. Wayuu
  270. Welsh
  271. Wolof
  272. Xavante
  273. Xhosa
  274. Yiddish
  275. Yoruba
  276. Yucateco
  277. Zulu
  278. Zuni
  1. Currently, only fully diacritized Arabic is supported.
  2. Persian written using English (Latin) characters.
  3. Currently, only Pha̍k-fa-sṳ is supported.
  4. Currently, only Hiragana and Katakana are supported.

See also

References

  1. ^ Switch to eSpeak NG in NVDA distribution #5651
  2. ^ eSpeak TTS for Android
  3. ^ espeak-ng package in Ubuntu
  4. ^ "Download voices for Immersive Reader, Read Mode, and Read Aloud".
  5. ^ Google blog, Giving a voice to more languages on Google Translate, May 2010
  6. ^ Google blog, Listen to us now, December 2010.
  7. ^ eSpeak Speech Synthesizer 3. LANGUAGES
  8. ^ http://espeak.sourceforge.net/
  9. ^ a b c "ESpeak: Speech synthesis - Browse /Espeak at SourceForge.net".
  10. ^ Subversion history (revision 1)
  11. ^ Subversion history (revision 56)
  12. ^ "Espeak: Downloads".
  13. ^ http://espeak.sourceforge.net/test/latest.html
  14. ^ van Leussen, Jan-Wilem; Tromp, Maarten (26 July 2007). "Latin to Speech" (Document). p. 6. {{cite document}}: Cite document requires |publisher= (help); Unknown parameter |citeseerx= ignored (help)
  15. ^ "Build: Allow portaudio 18 and 19 to be switched easily. · rhdunn/Espeak@63daaec". GitHub.
  16. ^ "Espeakedit: Fix argument processing for unicode argv types · rhdunn/Espeak@61522a1". GitHub.
  17. ^ "Switch to eSpeak NG in NVDA distribution · Issue #5651 · nvaccess/Nvda". GitHub.
  18. ^ Taking ownership of the eSpeak project and its future
  19. ^ Vote for new main eSpeak developer
  20. ^ Rebrand the espeak program to espeak-ng.
  21. ^ espeak-ng 1.49.0
  22. ^ Klatt, Dennis H. (1979). "Software for a cascade/parallel formant synthesizer" (PDF). J. Acoustical Society of America, 67(3) March 1980.
  23. ^ List of recorded fricatives in eSpeakNG
  24. ^ "ESpeak NG Text-to-Speech". GitHub. 13 February 2022.
  25. ^ "ESpeak NG Text-to-Speech". GitHub. 22 October 2021.
  26. ^ a b Butgereit, L., & Botha, A. (2009, May). Hadeda: The noisy way to practice spelling vocabulary using a cell phone. In The IST-Africa 2009 Conference, Kampala, Uganda.
  27. ^ Hamiti, M., & Kastrati, R. (2014). Adapting eSpeak for converting text into speech in Albanian. International Journal of Computer Science Issues (IJCSI), 11(4), 21.
  28. ^ a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae af ag ah ai aj ak al am an ao ap Kayte, S., & Gawali, D. B. (2015). Marathi Speech Synthesis: A review. International Journal on Recent and Innovation Trends in Computing and Communication, 3(6), 3708-3711.
  29. ^ Pronk, R. (2013). Adding Japanese language synthesis support to the eSpeak system. University of Amsterdam.
  30. ^ Mohanan, S., Salkar, S., Naik, G., Dessai, N. F., & Naik, S. (2012). Text Reader for Konkani Language. Automation and Autonomous System, 4(8), 409-414.
  31. ^ Kaur, R., & Sharma, D. (2016). An Improved System for Converting Text into Speech for Punjabi Language using eSpeak. International Research Journal of Engineering and Technology, 3(4), 500-504.

External links