Jump to content

Esperanto phonology

From Wikipedia, the free encyclopedia

Esperanto is a constructed international auxiliary language designed to have a simple phonology. The creator of Esperanto, L. L. Zamenhof, described Esperanto pronunciation by comparing the sounds of Esperanto with the sounds of several major European languages.

With over a century of use, Esperanto has developed a phonological norm, including accepted details of phonetics,[1] phonotactics,[2] and intonation,[3] so that it is now possible to speak of proper Esperanto pronunciation and of properly formed words independently of the languages originally used to describe it. This norm accepts only minor allophonic variation.[4]



The original Esperanto lexicon contains:

  • 23 consonants (including ĥ /x/, which has become rare, and 4 affricates)
  • 11 vowels (5 simple vowels and 6 diphthongs).

A few additional sounds found in loan words, such as /ou̯/, are not stable.


Labial Alveolar Post-
Velar Glottal
Nasal m n      
Plosive p b t d   k ɡ  
Affricate   t͜s (d͜z) t͜ʃ d͜ʒ    
Fricative f v s z ʃ ʒ x h
Approximant   l j  
Trill   r      

The uncommon affricate /d͜z/ does not have a distinct letter in the orthography, but is written with the digraph dz, as in edzo ('husband'). Not everyone agrees with Kalocsay & Waringhien that edzo and peco are a near rhyme, differing only in voicing, or on the status of /d͡z/ as a phoneme; Wennergren considers it to be a simple sequence of /d/ + /z/.[4] The phoneme /x/ has been largely replaced with /k/ and is now found mostly in loanwords and a very few established words such as ĉeĥo ('a Czech'; cf. ĉeko 'a check'). The letter ŭ is sometimes used as a consonant in onomatopoeia and unassimilated foreign names, in addition to the second element in diphthongs, which some argue is consonantal /w/ rather than vocalic /u̯/ (see below).



Esperanto has between 5 and 11 vowels, depending on analysis: 5 monophthongs and up to 6 diphthongs.

Front Back
Close i u
Mid e o
Open a
Front Back
Close ui̯
Mid ei̯
Open ai̯

There are six historically stable diphthongs: /ai̯/, /oi̯/, /ui̯/, /ei̯/ and /au̯/, /eu̯/. However, some authors such as John C. Wells regard them as vowel–consonant sequences – /aj/, /oj/, /uj/, /ej/, /aw/, /ew/ – while Wennergren regards /aj/, /oj/, /uj/, /ej/ as vowel–consonant sequences and only /au̯/, /eu̯/ as diphthongs, there otherwise being no /w/ in Esperanto.[5]



The Esperanto sound inventory and phonotactics are very close to those of Yiddish, Belarusian and Polish, which were personally important to Zamenhof, the creator of Esperanto. The primary difference is the absence of palatalization, although this was present in Proto-Esperanto (nacjes, now nacioj 'nations'; familje, now familio 'family') and arguably survives marginally in the affectionate suffixes -njo and -ĉjo, and in the interjection tju! [note 1] Apart from this, the consonant inventory is identical to that of Eastern Yiddish. Minor differences from Belarusian are that g is pronounced as a stop, [ɡ], rather than as a fricative, [ɣ] (in Belarusian, the stop pronunciation is found in recent loan words), and that Esperanto distinguishes /x/ and /h/, a distinction that Yiddish makes but that Belarusian (and Polish) do not. As in Belarusian, Esperanto /v/ is found in syllable onsets and /u̯/ in syllable codas; however, unlike Belarusian, /v/ does not become /u̯/ if forced into coda position through compounding. According to Kalocsay & Waringhien, if Esperanto /v/ does appear before a voiceless consonant, it will devoice to /f/, as in Yiddish.[6] However, Zamenhof avoided such situations by adding an epenthetic vowel: lavobaseno ('washbasin'), not *lavbaseno or *laŭbaseno. The Esperanto vowel inventory is essentially that of Belarusian.[note 1] Zamenhof's Litvish dialect of Yiddish (that of Białystok) has an additional schwa and diphthong but no uj.

Orthography and pronunciation


The Esperanto alphabet is nearly phonemic. The letters, along with the IPA and nearest English equivalent of their principal allophones, are:[7]

Consonants Simple vowels
Letter English IPA Letter      English      IPA
b b [b] a spa [a]
c bits [t͡s] e bet [e]
ĉ choose [t͡ʃ] i machine [i]
d d [d] o fork [o]
f f [f] u rude [u]
g go [ɡ]
ĝ gem [d͡ʒ]
h h [h] aj sky [ai̯]
ĥ loch [x] now [au̯]
j young [j] ej grey [ei̯]
ĵ pleasure [ʒ] (eh-oo) [eu̯]
k k [k] oj boy [oi̯]
l l [l] uj gooey [ui̯]
m m [m]
n n [n] Ŭ may be a consonant:
  • in the Esperanto name for the letter, ŭo
  • in foreign names, where it is more properly
    Esperantized to v [v]
  • occasionally in mimesis, as in ŭa! (waa!)
p p [p]
r r   (rhotic sound, usually rolled r) [r]
s s [s]
ŝ ship [ʃ]
t t [t]
v v [v]
z z [z]

Minimal pairs


Esperanto has many minimal pairs between the voiced and voiceless plosives, b d g and p t k; for example, pagi "pay" vs. paki "pack", baro "bar" vs. paro "pair", teko "briefcase" vs. deko "group of ten".

On the other hand, several distinctions between Esperanto consonants carry very light functional loads, though they are not in complementary distribution and therefore not allophones. The practical effect of this is that people who do not control these distinctions are still able to communicate without difficulty. These minor distinctions are ĵ /ʒ/ vs. ĝ /d͡ʒ/, contrasted in aĵo ('concrete thing') vs. aĝo ('age'); k /k/ vs. ĥ /x/ vs. h /h/, contrasted in koro ('heart') vs. ĥoro ('chorus') vs. horo ('hour'), and in the prefix ek- (inchoative) vs. eĥo ('echo'); dz /d͡z/ vs. z /z/, not contrasted in basic vocabulary; and c /t͡s/ vs. ĉ /t͡ʃ/, found in a few minimal pairs such as caro ('tzar'), ĉar ('because'); ci ('thou'), ĉi (proximate particle used with deictics); celo ('goal'), ĉelo ('cell'); -eco ('-ness'), ('even'); etc.

Belarusian seems to have provided the model for Esperanto's diphthongs, as well as the complementary distribution of v (restricted to the onset of a syllable), and ŭ (occurring only as a vocalic offglide), although this was modified slightly, with Belarusian corresponding to Esperanto ov (as in bovlo), and ŭ being restricted to the sequences aŭ, eŭ in Esperanto. Although v and ŭ may both occur between vowels, as in naŭa ('ninth') and nava ('of naves'), the diphthongal distinction holds: [ˈnau̯.a] vs. [ˈna.va]. (However, Zamenhof did allow initial ŭ in onomatopoeic words such as ŭa 'wah!'.) The semivowel j likewise does not occur after the vowel i, but is also restricted from occurring before i in the same morpheme, whereas the Belarusian letter i represents /ji/. Later exceptions to these patterns, such as poŭpo ('poop deck'), ŭato ('watt'), East Asian proper names beginning with ⟨Ŭ⟩, and jida ('Yiddish'), are marginal.[note 2]

The distinction between e and ej carries a light functional load, in the core vocabulary perhaps only distinctive before alveolar sonorants, such as kejlo ('peg'), kelo ('cellar'); mejlo ('mile'), melo ('badger'); Rejno ('Rhine'), reno ('kidney'). The recent borrowing gejo ('homosexual') could contrast with the ambisexual prefix ge- if used in compounds with a following consonant, and also creating possible confusion between geja paro ('homosexual couple') and gea paro ('heterosexual couple'), which are both pronounceable as [ˈɡeja ˈparo]. is also uncommon, and very seldom contrastive: eŭro ('a euro') vs. ero ('a bit').

Stress and prosody


Within a word, stress is on the syllable with the second-to-last vowel, such as the li in familio [famiˈli.o] ('family'). An exception is when the final -o of a noun is elided, usually for poetic reasons, because this does not affect the placement of the stress: famili' [famiˈli].

On the rare occasions that stress needed to be specified, as in explanatory material or with proper names, Zamenhof used an acute accent.[citation needed] The most common such proper name is Zamenhof's own: Zámenhof. If the stress falls on the last syllable, it is common for an apostrophe to be used, as in poetic elision: Oĝalan'.

There is no set rule for which other syllables might receive stress in a polysyllabic word, or which monosyllabic words are stressed in a clause. Morphology, semantic load, and rhythm all play a role. By default, Esperanto is trochaic; stress tends to hit alternate syllables: Ésperánto. However, derivation tends to leave such "secondary" stress unchanged, at least for many speakers: Ésperantísto or Espérantísto (or for some just Esperantísto) Similarly, compound words generally retain their original stress. They never stress an epenthetic vowel: thus vórto-provízo, not *vortó-provízo.

Within a clause, rhythm also plays a role. However, referential words (lexical words and pronouns) attract stress, whereas "connecting" words such as prepositions tend not to: dónu al mí or dónu al mi ('give to me'), not *dónu ál mi. In Ĉu vi vídas la húndon kiu kúras preter la dómo? ('Do you see the dog that's running past the house?'), the function words do not take stress, not even two-syllable kiu ('which') or preter ('beyond'). The verb esti ('to be') behaves similarly, as can be seen by the occasional elision of the e in poetry or rapid speech: Mi ne 'stas ĉi tie! ('I'm not here!') Phonological words do not necessarily match orthographic words. Pronouns, prepositions, the article, and other monosyllabic function words are generally pronounced as a unit with the following word: mihávas ('I have'), laknábo ('the boy'), delvórto ('of the word'), ĉetáblo ('at table'). Exceptions include kaj 'and', which may be pronounced more distinctly when it has a larger scope than the following word or phrase.[8]

Within poetry, of course, the meter determines stress: Hó, mia kór', ne bátu máltrankvíle ('Oh my heart, do not beat uneasily').

Emphasis and contrast may override normal stress. Pronouns frequently take stress because of this. In a simple question like Ĉú vi vídis? ('Did you see?'), the pronoun hardly needs to be said and is unstressed; compare Né, dónu al mí and ('No, give me'). Within a word, a prefix that wasn't heard correctly may be stressed upon repetition: Né, ne tíen! Iru máldekstren, mi diris! ('No, not over there! Go left, I said!'). Because stress doesn't distinguish words in Esperanto, shifting it to an unexpected syllable calls attention to that syllable, but doesn't cause confusion as it might in English.

As in many languages, initialisms behave unusually. When grammatical, they may be unstressed: k.t.p. [kotopo] ('et cetera'); when used as proper names, they tend to be idiosyncratic: UEA [ˈuˈeˈa], [ˈu.e.a], or [u.eˈa], but rarely *[u.ˈe.a]. This seems to be a way of indicating that the term is not a normal word. However, full acronyms tend to have regular stress: Tejo [ˈte.jo].

Lexical tone is not phonemic. Nor is clausal intonation, as question particles and changes in word order serve many of the functions that intonation performs in English.



A syllable in Esperanto is generally of the form (s/ŝ)(C)(C)V(C)(C). That is, it may have an onset, of up to three consonants; must have a nucleus of a single vowel or diphthong (except in onomatopoeic words such as zzz!), and may have a coda of zero to one (occasionally two) consonants.

Any consonant may occur initially, with the exception of j before i (though there is now one word that violates this restriction, jida ('Yiddish') which contrasts with ida "of an offspring").

Any consonant except h may close a syllable, though coda ĝ and ĵ are rare in monomorphemes (they contrast in aĝ' 'age' vs. aĵ' 'thing'). Within a morpheme, there may be a maximum of four sequential consonants, as for example in instruas ('teaches'), dekstren ('to the right'). Long clusters generally include a sibilant such as s or one of the liquids l or r.

Geminate consonants generally only occur in polymorphemic words, such as mal-longa ('short'), ek-kuŝi ('to flop down'), mis-skribi ('to mis-write'); in ethnonyms such as finno ('a Finn'), gallo ('a Gaul') (now more commonly gaŭlo); in proper names such as Ŝillero ('Schiller'), Buddo ('Buddha', now more commonly Budao); and in a handful of unstable borrowings such as matĉo ('a sports match'). In compounds of lexical words, Zamenhof separated identical consonants with an epenthetic vowel, as in vivovespero ('the evening of life'), never *vivvespero.

Word-final consonants occur, though final voiced obstruents are generally rejected. For example, Latin ad ('to') became Esperanto al, and Polish od ('than') morphed into Esperanto ol ('than'). Sonorants and voiceless obstruents, on the other hand, are found in many of the numerals: cent ('hundred'), ok ('eight'), sep ('seven'), ses ('six'), kvin ('five'), kvar ('four'); also dum ('during'), ('even'). Even the poetic elision of final -o is rarely seen if it would leave a final voiced obstruent. A very few words with final voiced obstruents do occur, such as sed ('but') and apud ('next to'), but in such cases there is no minimal-pair contrast with a voiceless counterpart (that is, there is no *set or *aput to cause confusion). This is because many people, including the Slavs and Germans, do not contrast voicing in final obstruents. For similar reasons, sequences of obstruents with mixed voicing are not found in Zamenhofian compounds, apart from numerals and grammatical forms, thus longatempe 'for a long time', not *longtempe. (Note that /v/ is an exception to this rule, like in the Slavic languages. It is effectively ambiguous between fricative and approximant. The other exception is /kz/, which is commonly treated as /ɡz/.)

Syllabic consonants occur only as interjections and onomatopoeia: fr!, sss!, ŝŝ!, hm!.

All triconsonantal onsets begin with a sibilant, s or ŝ. Disregarding proper names, such as Vladimiro, the following initial consonant clusters occur:

  • Stop + liquid – bl, br; pl, pr; dr; tr; gl, gr; kl, kr
  • Voiceless fricative + liquid – fl, fr; sl; ŝl, ŝr
  • Voiceless sibilant + voiceless stop (+ liquid) – sc [st͡s], sp, spl, spr; st, str; sk, skl, skr; ŝp, ŝpr; ŝt, ŝtr
  • Obstruent + nasal – gn, kn, sm, sn, ŝm, ŝn
  • Obstruent + /v/gv, kv, sv, ŝv

And more marginally,

Consonant + /j/(tj), ĉj, fj, vj, nj

Although it does not occur initially, the sequence ⟨dz⟩ is pronounced as an affricate, as in edzo [ˈe.d͡zo] ('a husband') with an open first syllable [e], not as *[ed.zo].

In addition, initial ⟨pf⟩ occurs in German-derived pfenigo ('penny'), ⟨kŝ⟩ in Sanskrit kŝatrio ('kshatriya'), and several additional uncommon initial clusters occur in technical words of Greek origin, such as mn-, pn-, ks-, ps-, sf-, ft-, kt-, pt-, bd-, such as sfinktero ('a sphincter' which also has the coda ⟨nk⟩). Quite a few more clusters turn up in sufficiently obscure words, such as ⟨tl⟩ in tlaspo "Thlaspi" (a genus of herb), and Aztec deities such as Tlaloko ('Tlaloc'). (The /l/ phonemes are presumably devoiced in these words.)

As this might suggest, greater phonotactic diversity and complexity is tolerated in learnèd than in quotidian words, almost as if "difficult" phonotactics were an iconic indication of "difficult" vocabulary. Diconsonantal codas, for example, generally only occur in technical terms, proper names, and in geographical and ethnic terms: konjunkcio ('a conjunction'), arkta ('Arctic'), istmo ('isthmus').

However, there is a strong tendency for more basic terms to avoid coda clusters, although cent ('hundred'), post ('after'), sankta ('holy'), and the prefix eks- ('ex-') (which can be used as an interjection: Eks la reĝo! 'Down with the king!') are exceptions. Even when coda clusters occur in the source languages, they are often eliminated in Esperanto. For instance, many European languages have words relating to "body" with a root of korps-. This root gave rise to two words in Esperanto, neither of which keep the full cluster: korpuso ('a military corps') (retaining the original Latin u), and korpo ('a biological body') (losing the s).

Many ordinary roots end in two or three consonants, such as cikl-o ('a bicycle'), ŝultr-o ('a shoulder'), pingl-o ('a needle'), tranĉ-i ('to cut'). However, these roots do not normally entail coda clusters except when followed by another consonant in compounds, or with poetic elision of the final -o. Even then, only sequences with decreasing sonority are possible, so although poetic tranĉ' occurs, *cikl', *ŝultr', and *pingl' do not. (Note that the humorous jargon Esperant' does not follow this restriction, because it elides the grammatical suffix of all nouns no matter how awkward the result.)

Within compounds, an epenthetic vowel is added to break up what would otherwise be unacceptable clusters of consonants. This vowel is most commonly the nominal affix -o, regardless of number or case, as in kant-o-birdo ('a songbird') (the root kant-, 'to sing', is inherently a verb), but other part-of-speech endings may be used when -o- is judged to be grammatically inappropriate, as in mult-e-kosta ('expensive'). There is a great deal of personal variation as to when an epenthetic vowel is used.

Allophonic variation


With only five oral and no nasal or long vowels, Esperanto allows a fair amount of allophonic variation, though the distinction between /e/ and /ei̯/, and arguably /o/ and /ou̯/, is phonemic. The /v/ may be a labiodental fricative [v] or a labiodental approximant [ʋ], again in free variation; or [w], especially in the sequences kv and gv ([kw] and [ɡw], like English "qu" and "gu"), but with [v] considered normative. Alveolar consonants t, d, n, l are acceptably either apical (as in English) or laminal (as in French, generally but incorrectly called "dental"). Postalveolars ĉ, ĝ, ŝ, ĵ may be palato-alveolar (semi-palatalized) [t̠ʃ, d̠ʒ, ʃ, ʒ] as in English and French, or retroflex (non-palatalized) [t̠ʂ d̠ʐ ʂ ʐ] as in Polish, Russian, and Mandarin Chinese. H and ĥ may be voiced [ɦ, ɣ], especially between vowels.



The consonant r can be realised in many ways, as it was defined differently in each language version of the Fundamento de Esperanto:[9]

  • In the French Fundamento, it is defined as r. The rhotic in Standard French varies from a voiced uvular fricative or approximant [ʁ] to a uvular trill [ʀ].[10]
  • In the English Fundamento, it is defined as the r in rare, which is an alveolar approximant [ɹ].
  • In the German Fundamento, it is defined as r. Most varieties of Standard German have a uvular rhotic, now usually a fricative or approximant [ʁ], rather than [ʀ]. The alveolar pronunciation [r ~ ɾ] is used in some standard German varieties of Germany, Austria, and Switzerland.
  • In the Polish Fundamento, it is defined as r, which is a flap [ɾ].
  • In the Russian Fundamento, it is defined as r (Cyrillic р), which is an alveolar trill [r].

The most common realization depends on the region and native language of the Esperanto speaker. For example, a very common realisation in English speaking countries is the alveolar flap [ɾ]. Worldwide, the most common realisation is probably the alveolar trill [r]. The grammatical reference Plena Manlibro de Esperanta Gramatiko considers the uvular trill [ʀ] to be perfectly acceptable.[11] In practice, the different pronunciations are understood and accepted by experienced Esperanto speakers.

Vowel length and quality


Vowel length is not phonemic in Esperanto. Vowels tend to be long in open stressed syllables and short otherwise.[6] Adjacent stressed syllables are not allowed in compound words, and when stress disappears in such situations, it may leave behind a residue of vowel length. Vowel length is sometimes presented as an argument for the phonemic status of the affricates, because vowels tend to be short before most consonant clusters (excepting stops plus l or r, as in many European languages), but long before /ĉ/, /ĝ/, /c/, and /dz/, though again this varies by speaker, with some speakers pronouncing a short vowel before /ĝ/, /c/, /dz/ and a long vowel only before /ĉ/.[6]

Vowel quality has never been an issue for /a/, /i/ and /u/, but has been much discussed for /e/ and /o/. Zamenhof recommended pronouncing the vowels /e/ and /o/ as mid [e̞, o̞] at all times. Kalocsay and Waringhien gave more complicated recommendations.[12] For example, they recommended pronouncing stressed /e/, /o/ as short open-mid [ɛ, ɔ] in closed syllables and long close-mid [eː, oː] in open syllables. However, this is widely considered unduly elaborate, and Zamenhof's recommendation of using mid qualities is considered the norm. For many speakers, however, the pronunciation of /e/ and /o/ reflects the details of their native language.



Zamenhof noted that epenthetic glides may be inserted between dissimilar vowels, especially after high vowels as in [ˈmija] for mia ('my'), [miˈjelo] for mielo ('honey') and [ˈpluwa] for plua ('further'). This is quite common, and there is no possibility of confusion, because /ij/ and /uŭ/ do not occur in Esperanto (though more general epenthesis could cause confusion between gea and geja, as mentioned above). However, Zamenhof stated that in "severely regular" speech such epenthesis would not occur.[6]

Epenthetic glottal stops in vowel sequences such as boao ('boa') are non-phonemic detail, allowed for the comfort of the speaker. Glottal stop is especially common in sequences of identical vowels, such as heroo [heˈroʔo] ('hero'), and praavo [praˈʔavo] ('great-grandfather'). Other speakers, however, mark the hiatus by a change of intonation, such as by raising the pitch of the stressed vowel: heróò, pràávo.

As in many languages, fricatives may become affricates after a nasal, via an epenthetic stop. Thus, the neologism senso ('sense', as in the five senses) may be pronounced the same as the fundamental word senco ('sense, meaning'), and the older term for the former, sentumo, may be preferable.

An epenthetic vowel, most commonly the schwa, can be inserted to break up clusters that might be difficult to pronounce.

Poetic elision


Vowel elision is allowed with the grammatical suffix -o of singular nominative nouns, and the a of the article la, though this rarely occurs outside of poetry: de l' kor' ('from the heart').

Normally semivowels are restricted to offglides in diphthongs. However, poetic meter may force the reduction of unstressed /i/ and /u/ to semivowels before a stressed vowel: kormilionoj [koɾmiˈli̯onoi̯]; buduaro [buˈdu̯aɾo].



Zamenhof recognized place-assimilation of nasals before another consonant, such as n before a velar, as in banko [ˈbaŋko] ('bank') and sango [ˈsaŋɡo] ('blood'), or before palatal /j/, as in panjo [ˈpaɲjo] ('mommy') and sinjoro [siɲˈjoro] ('sir'). However, he stated that "severely regular" speech would not have such variation from his ideal of 'one letter, one sound'.[6] Nonetheless, although the desirability of such allophony may be debated, the question almost never arises as to whether the m in emfazi should remain bilabial or should assimilate to labiodental f ([eɱˈfazi]), because this assimilation is nearly universal in human language. Indeed, where the orthography allows (e.g. bombono 'bonbon'), we see that assimilation can occur.

In addition, speakers of many languages (including Zamenhof's, though not always English) have regressive voicing assimilation, when two obstruents (consonants that occur in voiced-voiceless pairs) occur next to each other. Zamenhof did not mention this directly, but did indicate it indirectly, in that he didn't create compound words with adjacent obstruents that have mixed voicing. For example, by the phonotactics of both of Zamenhof's mother tongues, Yiddish and (Belo)Russian, rozkolora ('rose-colored', 'pink') would be pronounced the same as roskolora ('dew-colored'), and so the preferred form for the former is rozokolora.[note 3] Indeed, Kalocsay & Waringhien state that when voiced and voiceless consonants are adjacent, the assimilation of one of them is "inevitable". Thus one pronounces okdek ('eighty') as /oɡdek/, as if it were spelled "ogdek"; ekzisti ('exist') as /eɡzisti/, as if it were spelled "egzisti"; ekzemple ('for example') as /eɡzemple/, subteni ('support') as /supteni/, longtempe ('for a long time') as /lonktempe/, glavsonoro ('ringing of a sword') as /ɡlafsonoro/, etc.[6][13] Such assimilation likewise occurs in words that maintain Latinate orthography, such as absolute ('absolutely'), pronounced /apsolute/, and obtuza ('obtuse'), pronounced /optuza/, despite the superficially contrastive sequences in the words apsido ('apsis') and optiko ('optics').[6][13] Instead, the debate centers on the non-Latinate orthographic sequence kz, frequently found in Latinate words like ekzemple and ekzisti above.[note 4] It is sometimes claimed that kz is properly pronounced exactly as written, with mixed voicing, [kz], despite the fact that assimilation to [ɡz] occurs in Russian, English (including the words 'example' and 'exist'), Polish (where it is even spelled ⟨gz⟩), French and many other languages. These two positions are called ekzismo and egzismo in Esperanto.[note 5] In practice, most Esperanto speakers assimilate kz to /ɡz/ and pronounce nk as [ŋk] when speaking fluently.[13]

Voicing assimilation
Voiceless obstruent p t c ĉ k f s ŝ ĥ
Pronunciation before any voiced obstruent but v b d dz ĝ g v z ĵ [ɣ]
Voiced obstruent b d dz ĝ g v z ĵ
Pronunciation before a voiceless obstruent p t c ĉ k f s ŝ

In compound lexical words, Zamenhof himself inserted an epenthetic vowel between obstruents with different voicing, as in rozokolora above, never *rozkolora, and longatempe, never *longtempe as with some later writers; mixed voicing only occurred with grammatical words, for example with compound numbers and with prepositions used as prefixes, as in okdek and subteni above. V is never found before any consonant in Zamenhof's writing, because that would force it to contrast with ŭ.

Similarly, mixed sibilant sequences, as in the polymorphemic disĵeti ('to scatter'), tend to assimilate in rapid speech, sometimes completely (/diĵĵeti/).

Like the generally ignored regressive devoicing in words such as absurda, progressive devoicing tends to go unnoticed within obstruent–sonorant clusters, as in plua [ˈpl̥ua] ('additional'; contrasts with blua [ˈblua] 'blue') and knabo [ˈkn̥abo] ('boy'; the kn- contrasts with gn-, as in gnomo [ˈɡnomo] 'gnome'). Partial to full devoicing of the sonorant is probably the norm for most speakers.

Voicing assimilation of affricates and fricatives before nasals, as in taĉmento ('a detachment') and the suffix -ismo ('-ism'), is both more noticeable and easier for most speakers to avoid, so [ˈizmo] for -ismo is less tolerated than [apsoˈlute] for absolute.

Loss of phonemic ĥ


The sound of ⟨ĥ⟩, /x/, was always somewhat marginal in Esperanto, and there has been a strong move to merge it into /k/, starting with suggestions from Zamenhof himself.[14][15][citation needed] Dictionaries generally cross-reference ⟨ĥ⟩ and ⟨k⟩, but the sequence ⟨rĥ⟩ (as in arĥitekturo 'architecture') was replaced by ⟨rk⟩ (arkitekturo) so completely by the early 20th century that few dictionaries even list ⟨rĥ⟩ as an option.[citation needed] The central/eastern European form for 'Chinese', ĥino, has been completely replaced with the western European form, ĉino, a unique exception to the general pattern, perhaps because the word kino ('cinematography') already existed. Other words, such as ĥemio ('chemistry') and monaĥo ('monk'), still vary but are more commonly found with ⟨k⟩ (kemio, monako). In a few cases, such as with words of Russian origin, ⟨ĥ⟩ may instead be replaced by ⟨h⟩. This merger has had only a few complications. Zamenhof gave ĥoro ('chorus') the alternative form koruso, because both koro ('heart') and horo ('hour') were taken. The two words still almost universally seen with ⟨ĥ⟩ are eĥo ('echo') and ĉeĥo ('a Czech'). Ek- and ĉeko ('check') already exist, though ekoo for eĥo is occasionally seen.

Proper names and borrowings


A common source of allophonic variation is borrowed words, especially proper names, when non-Esperantized remnants of the source-language orthography remain, or when novel sequences are created in order to avoid duplicating existing roots. For example, it is doubtful that many people fully pronounce the g in Vaŝingtono ('Washington') as either /ɡ/ or /k/, or pronounce the ⟨h⟩ in Budho ('Buddha') at all. Such situations are unstable, and in many cases dictionaries recognize that certain spellings (and therefore pronunciations) are inadvisable. For example, the physical unit "watt" was first borrowed as ŭato, to distinguish it from vato ('cotton-wool'), and this is the only form found in dictionaries in 1930. However, initial ⟨ŭ⟩ violates Esperanto phonotactics, and by 1970 there was an alternative spelling, vatto. This was also unsatisfactory, however, because of the geminate ⟨t⟩, and by 2000 the effort had been given up, with vato now the advised spelling for both 'watt' and 'cotton-wool'. Some recent dictionaries no longer even list initial ⟨ŭ⟩ in their index.[16] Likewise, several dictionaries now list the spellings Vaŝintono for 'Washington' and Budao for 'Buddha'.



Before Esperanto phonotactics became fixed, foreign words were adopted with spellings that violated the apparent intentions of Zamenhof and the norms that would develop later, such as poŭpo[note 6] ('poop deck'), ŭato[note 7] ('watt'), and matĉo[note 8] ('sports match'). Many of these coinages have proven to be unstable, and have either fallen out of use or been replaced with pronunciations more in keeping with the developing norms, such as pobo for poŭpo, vato for ŭato, and maĉo for matĉo. On the other hand, jida[note 9] ('Yiddish') was also sometimes criticized on phonotactical grounds, but was used by Zamenhof after its introduction in the Plena Vortaro as a replacement for novjuda and judgermana and is well established.

See also



  1. ^ a b The Belarusian letters ł, l represent /l, lʲ/ (phonetically [lˠ, lʲ]), and i, y represent /ji, i/ (phonetically [ji, ɨ]), so these are accounted for by the absence of palatalization. In Yiddish, palatal(ized) consonants are restricted to Slavic loanwords, apart from /lʲ/, which is not distinct from /l/ for all speakers.
  2. ^ Poŭpo, ŭato, and names such as Ŭakajama ('Wakayama') are more fully assimilated as pobo, vato, and Vakajama.
  3. ^ The voiceless paired obstruents in Esperanto are /p t c ĉ k f s ŝ/, and their voiced analogues are /b d dz ĝ g v z ĵ/. There are also unpaired /ĥ h/. The /v/ is a partial exception in that it is only avoided as the first element of a consonant sequence, with no difficulty distinguishing /kv/ from /gv/. This again follows the pattern of (Belo)Russian, where /v/ has characteristics of both consonant and vowel.
  4. ^ Words that begin with ex in their source language generally become ekz- in Esperanto, to distinguish them from the common prefix eks-.
  5. ^ Orthographic gz does not occur in Esperanto, except in the nonce word egzismo itself.
  6. ^ Violation: coining of a new diphthong
  7. ^ violation: the use of ⟨ŭ⟩ as a "w" at the beginning of a syllable
  8. ^ violation: the use of geminate consonants outside of compound words.
  9. ^ In that /j/ does not occur before the vowel /i/ in other words, and this sequence is difficult for many people to distinguish from ida.


  1. ^ Burkina, O. (2005): "Rimarkoj pri la prononca normo en Esperanto", Lingvaj kaj historiaj analizoj. Aktoj de la 28-a Esperantologia Konferenco en la 90-a Universala Kongreso de Esperanto
  2. ^ "PMEG: Specialaj elparolaj reguloj". bertilow.com (in Esperanto). Section: Duoblaj literoj.
  3. ^ John C. Wells, La frazmelodio en internacia perspektivo (.doc document).
  4. ^ a b "PMEG: Bazaj elparolaj reguloj". bertilow.com (in Esperanto).
  5. ^ "Duonvokaloj kaj diftongoj". Lingva Kritiko (in Esperanto).
  6. ^ a b c d e f g Plena analiza gramatiko, §17
  7. ^ The pronunciations of the consonants given here are those found at the beginnings of words and between vowels, voicing assimilation.
  8. ^ Edmond Privat, Esprimo de sentoj en Esperanto 1980:10
  9. ^ Wennergren, Bertilo. "Fundamento de Esperanto". www.akademio-de-esperanto.org (in Esperanto). Retrieved 16 April 2018.
  10. ^ Fougeron, Cecile; Smith, Caroline L. (1993). "French". Journal of the International Phonetic Association. 23 (2): 73–76. doi:10.1017/S0025100300004874. S2CID 249404451.: 75 
  11. ^ Wennergren, Bertilo. "PMEG: Bazaj elparolaj reguloj". bertilow.com (in Esperanto). Retrieved 16 April 2018.
  12. ^ Plena Analiza Gramatiko de Esperanto 4th edition, 1980
  13. ^ a b c Miroslav Malovec, 1999, Gramatiko de Esperanto, §2.9.
  14. ^ Chris Gledhill. "Regularity and Representation in Spelling: the case of Esperanto". Journal of the Simplified Spelling Society 1994-1 pp 17–23.[1]
  15. ^ R. Bartholdt and A. Christen, H. Res. 415 "A resolution providing for the study of Esperanto as an auxiliary language". Hearings before the Committee on Education, House of Representatives, 63rd Congress, 2nd Session 1914 March 17.[2]
  16. ^ For instance, the Reta Vortaro didn't list ⟨ŭ⟩ for years,[3] until it added an entry for ŭoŭ 'wow!' in 2011.[4]