Jump to content

Cantonese phonology

From Wikipedia, the free encyclopedia

Standard Cantonese pronunciation is that of Guangzhou, also known as Canton, capital of Guangdong Province. Hong Kong Cantonese is related to Guangzhou dialect, and they diverge only slightly. Yue dialects in other parts of Guangdong and Guangxi provinces like Taishanese, may be considered divergent to a greater degree.


Cantonese uses about 1,760 syllables to cover pronunciations of more than 10,000 Chinese characters. Most syllables are represented by standard Chinese characters, however a few are written with colloquial Cantonese characters. Cantonese has relatively simple syllable structure when compared to other languages. A Cantonese syllable contains one tone-carrying vowel with up to one consonant on either side.[1] The average Cantonese syllable represents 6 unique Chinese characters.


A Cantonese syllable usually includes an initial (onset) and a final (rime/rhyme). The Cantonese syllabary has about 630 syllables.

Some like /kʷeŋ˥/ (), /ɛː˨/ and /ei˨/ () are no longer common; some like /kʷek˥/ and /kʷʰek˥/ (), or /kʷaːŋ˧˥/ and /kɐŋ˧˥/ (), have traditionally had two equally correct pronunciations but its speakers are starting to pronounce them in only one particular way (and this usually happens because the unused pronunciation is almost unique to that word alone), making the unused sounds effectively disappear from the language; some like /kʷʰɔːk˧/ (), /pʰuːi˥/ (), /tsɵi˥/ (), /kaː˥/ (), have alternative nonstandard pronunciations which have become mainstream (as /kʷʰɔːŋ˧/, /puːi˥/, /jɵi˥/ and /kʰɛː˥/ respectively), again making some of the sounds disappear from the everyday use of the language; and yet others like /faːk˧/ (), /fɐŋ˩/ (), /tɐp˥/ () have become popularly (but erroneously) believed to be made-up/borrowed words to represent sounds in modern vernacular Cantonese when they have in fact been retaining those sounds before these vernacular usages became popular.

On the other hand, new words circulate in Hong Kong which use combinations of sounds which had not appeared in Cantonese before, like /kɛt˥/ (note: /ɛːt/ was never an accepted/valid final for sounds in Cantonese and is nonstandard usage, though the final sound /ɛːt/ has appeared in vernacular Cantonese before this, /pʰɛːt˨/ – notably as the measure word of gooey or sticky substances like mud, poop, glue, chewing gum); the sound is borrowed from the English word get meaning "to understand".

Initial consonants[edit]

Initials (or onsets) refer to the 19 initial consonants which may occur at the beginning of a syllable. Some syllables have no initials and are said to have null initial. The following is the inventory for Cantonese as represented in IPA:

Labial Dental/Alveolar Palatal Velar Glottal
plain labialized
Nasal m n[A] ŋ[A]
Stop plain p t k [B] (ʔ)[C]
aspirated kʰʷ[B]
Affricate plain t͡s
aspirated t͡sʰ
Fricative f s h
Approximant l[A] j[B] w[B]

Note the aspiration contrast and the lack of voicing contrast in stops.

  1. ^ a b c In casual speech, many native speakers do not distinguish between /n/ and /l/, nor between /ŋ/ and the null initial.[2] Usually they pronounce only /l/ and the null initial. See the discussion on phonological shift below.
  2. ^ a b c d Some linguists prefer to analyze /j/ and /w/ as part of finals to make them analogous to the /i/ and /u/ medials in Mandarin, especially in comparative phonological studies. However, since final-heads only appear with null initial, /k/ or /kʰ/, analyzing them as part of the initials greatly reduces the count of finals at the cost of adding only four initials.
  3. ^ Some linguists analyze a /ʔ/ (glottal stop) in place of the null initial when a vowel begins a sound.

Coronals' position varies from dental to alveolar, with /t/ and /tʰ/ more likely to be dental. Coronal affricates and sibilants /t͡s/, /t͡sʰ/, /s/'s position is alveolar and articulatory findings indicate they are palatalized before close front vowels /iː/ and /yː/.[3] Affricates /t͡s/ and /t͡sʰ/ also have a tendency to palatalize before central round vowels /œː/ and /ɵ/.[4] Historically, another alveolo-palatal sibilant series existed as discussed below.

Vowels and finals[edit]

Chart of monophthongs used in Cantonese, from Zee (1999:59)
Chart of diphthongs used in Cantonese, from Zee (1999:59)

Finals (or rimes/rhymes) are the part of the sound after the initial. A final is typically composed of a main vowel (nucleus) and a terminal (coda).

Eleven vowel analysis[edit]

As the traditionally transcribed near-close finals ([ɪŋ], [ɪk], [ʊŋ], [ʊk]) have been found to be pronounced in the mid region on acoustic findings,[5] sources like Bauer & Benedict (1997:46–47) prefer to analyze them as close-mid ([eŋ], [ek], [oŋ], [ok]) which results in eleven vowel phonemes. In this analysis, vowel length is a key contrastive feature of the vowels.

Front Central Back
unrounded rounded
short long short long short long short long
Close // // //
Mid /e/ /ɛː/ /ɵ/ /œː/ /o/ /ɔː/
Open /ɐ/ //

The following chart lists all the finals in Cantonese as represented in IPA.[6]

Main Vowel Syllabic
// /ɐ/ /ɛː/ /e/ /œː/ /ɵ/ /ɔː/ /o/ // // //
Monophthong ɛː œː ɔː
Diphthong /i/
[i, y]
aːi ɐi ei ɵy ɔːi uːi
/u/ aːu ɐu ɛːu [note] ou iːu
Nasal /m/ aːm ɐm ɛːm [note] iːm
/n/ aːn ɐn ɛːn[note] ɵn ɔːn iːn yːn uːn
/ŋ/ aːŋ ɐŋ ɛːŋ œːŋ ɔːŋ ŋ̩
Checked /p/ aːp ɐp ɛːp [note] iːp
/t/ aːt ɐt ɛːt[note] ɵt ɔːt iːt yːt uːt
/k/ aːk ɐk ɛːk ek œːk ɔːk ok

Eight vowel analysis[edit]

Some sources prefer to keep the near-close finals ([ɪŋ], [ɪk], [ʊŋ], [ʊk]) as traditionally transcribed and analyze the long-short pairs [ɛː, e], [ɔː, o], [œː, ɵ], [iː, ɪ] and [uː, ʊ] as allophones of the same phonemes, resulting in an eight vowel system instead.[7] In this analysis, vowel length is mainly allophonic and is contrastive only in the open vowels.

Front Central Back
Unrounded Rounded Short Long
Close /i/ [iː, ɪ] /y/ [yː] /u/ [uː, ʊ]
Mid /e/ [ɛː, e] /ø/ [œː, ɵ] /o/ [ɔː, o]
Open /ɐ/ //

The following chart lists all the finals in Cantonese as represented in IPA.[7]

Main Vowel Syllabic
// /ɐ/ /e/
[ɛː, e]
[œː, ɵ]
[ɔː, o]
[iː, ɪ]
[uː, ʊ]
Monophthong ɛː œː ɔː
Diphthong /i/
[i, y]
aːi ɐi ei ɵy ɔːi uːi
/u/ aːu ɐu ɛːu [note] ou iːu
Nasal /m/ aːm ɐm ɛːm [note] iːm
/n/ aːn ɐn ɛːn[note] ɵn ɔːn iːn yːn uːn
/ŋ/ aːŋ ɐŋ ɛːŋ œːŋ ɔːŋ ɪŋ ʊŋ ŋ̩
Checked /p/ aːp ɐp ɛːp [note] iːp
/t/ aːt ɐt ɛːt[note] ɵt ɔːt iːt yːt uːt
/k/ aːk ɐk ɛːk œːk ɔːk ɪk ʊk

Other notes[edit]

Note: a b c d e Finals /ɛːu/,[8] /ɛːm/, /ɛːn/, /ɛːp/ and /ɛːt/ only appear in colloquial pronunciations of characters.[9] They are absent from some analyses and romanization systems.

Diphthongal ending /i/ is rounded after rounded vowels.[8] Nasal consonants can occur as base syllables in their own right and are known as syllabic nasals. The stop consonants (/p, t, k/) are unreleased ([p̚, t̚, k̚]).

When the three checked tones are separated, the stop codas /p, t, k/ are in complementary distribution with the nasal codas /m, n, ŋ/.


Relative fundamental-frequency contours for six Cantonese tones with examples and Jyutping/Yale tone numbers (modified from Francis (2008))

Cantonese uses tone contours to distinguish words, with the number of possible tones depending on the type of final. While Guangzhou Cantonese generally distinguishes between high-falling and high level tones, the two have merged in Hong Kong Cantonese and Macau Cantonese, yielding a system of six different tones in syllables ending in a semi-vowel or nasal consonant. (Some of these have more than one realization, but such differences are not used to distinguish words.) In finals that end in a stop consonant, the number of tones is reduced to three; in Chinese descriptions, these "checked tones" are treated separately by diachronic convention, so that Cantonese is traditionally said to have nine tones. However, phonetically these are a conflation of tone and final consonant; the number of phonemic tones is six in Hong Kong and seven in Guangzhou.[10]

Coda type Non-stop coda Stop coda
Tone name dark flat
dark rising
dark departing
light flat
light rising
light departing
upper dark entering
lower dark entering
light entering
Description high level,
high falling
medium rising medium level low falling,
very low level
low rising low level high level medium level low level
Example 詩, 思
Tone letter siː˥, siː˥˧ siː˧˥ siː˧ siː˨˩, siː˩ siː˩˧ siː˨ sɪk̚˥ sɛːk̚˧ sɪk̚˨
IPA diacritic síː, sîː sǐː sīː si̖ː, sı̏ː si̗ː sìː sɪ́k̚ sɛ̄ːk̚ sɪ̀k̚
Yale or Jyutping
tone number
1 2 3 4 5 6 7 (or 1) 8 (or 3) 9 (or 6)
Yale diacritic , si sìh síh sih sīk sek sihk

For purposes of meters in Chinese poetry, the first and fourth tones are "flat/level tones" (平聲), while the rest are "oblique tones" (仄聲). This follows their regular evolution from the four tones of Middle Chinese.

The first tone can be either high level or high falling usually without affecting the meaning of the words being spoken. Most speakers are in general not consciously aware of when they use and when to use high level and high falling. Most Hong Kong speakers have merged the high level and high falling tones. In Guangzhou, the high falling tone is disappearing as well, but is still prevalent among certain words, e.g. in traditional Yale Romanization with diacritics, sàam (high falling) means the number three , whereas sāam (high level) means shirt .[11]

The relative pitch of the tones varies with the speaker; consequently, descriptions vary from one source to another. The difference between high and mid level tone (1 and 3) is about twice that between mid and low level (3 and 6): 60 Hz to 30 Hz. Low falling (4) starts at the same pitch as low level (6), but then drops; as is common with falling tones, it is shorter than the three level tones. The two rising tones, (2) and (5), both start at the level of (6), but rise to the level of (1) and (3), respectively.[12]

The tone 3, 4, 5 and 6 are dipping in the last syllable when in an interrogative sentence or an exclamatory sentence. 眞係? "really?" is pronounced [tsɐn˥ hɐi˨˥].

The numbers "394052786" when pronounced in Cantonese, will give the nine tones in order (Romanization (Yale) saam1, gau2, sei3, ling4, ng5, yi6, chat7, baat8, luk9), thus giving a mnemonic for remembering the nine tones.

Like other Yue dialects, Cantonese preserves an analog to the voicing distinction of Middle Chinese in the manner shown in the chart below.

Middle Chinese Cantonese
Tone Initial Nucleus Tone Name Tone Contour Tone Number
Level voiceless dark level ˥, ˥˧ 1
voiced light level ˨˩, ˩ 4
Rising voiceless dark rising ˧˥ 2
voiced light rising ˩˧ 5
Departing voiceless dark departing ˧ 3
voiced light departing ˨ 6
Entering voiceless Short upper dark entering ˥ 7 (1)
Long lower dark entering ˧ 8 (3)
voiced light entering ˨ 9 (6)

The distinction of voiced and voiceless consonants found in Middle Chinese was preserved by the distinction of tones in Cantonese. The difference in vowel length further caused the splitting of the dark entering tone, making Cantonese (as well as other Yue Chinese branches) one of the few Chinese varieties to have further split a tone after the voicing-related splitting of the four tones of Middle Chinese.[13][14]

Cantonese is special in the way that the vowel length can affect both the rime and the tone. Some linguists[who?] believe that the vowel length feature may have roots in the Old Chinese language.

It also has two changed tones, which add the diminutive-like meaning "that familiar example" to a standard word. For example, word for "silver" (, /ŋɐn˩/) in a modified tone (/ŋɐn˩꜔꜒/, riɡht-facinɡ tone bars denote chanɡed tones) means "coin". They are comparable to the diminutive suffixes and of Mandarin. In addition, modified tones are used in compounds, reduplications (擒擒青 /kɐm˩ kɐm˩ tʃʰɛːŋ˥//kɐm˩ kɐm˩꜔꜒ tʃʰɛːŋ˥/ "in a hurry") and direct address to family members (妹妹 /muːy˨ muːy˨//muːy˨꜖ muːy˨꜔꜒/ "sister").[15] The two modified tones are high level, like tone 1, and mid rising, like tone 2, though for some people not as high as tone 2. The high level changed tone is more common for speakers with a high falling tone; for others, mid rising (or its variant realization) is the main changed tone, in which case it only operates on those syllables with a non-high level and non-mid rising tone (i.e. only tones 3, 4, 5 and 6 in Yale and Jyutping romanizations may have changed tones).[16] However, in certain specific vocatives, the changed tone does indeed result in a high level tone (tone 1), including speakers without a phonemically distinct high falling tone.[17]

Historical change[edit]

Like other languages, Cantonese sounds are constantly changing, processes where more and more native speakers of a language change the pronunciations of certain sounds.

One shift that affected Cantonese in the past was the lost distinction between alveolar and alveolo-palatal (sometimes termed as postalveolar) sibilants which occurred during the late 19th and early 20th centuries. Many Cantonese dictionaries and pronunciation guides published prior to the 1950s documented this distinction but it is no longer distinguished in any modern Cantonese dictionary.

Publications that documented this distinction include:

  • Williams, S., A Tonic Dictionary of the Chinese Language in the Canton Dialect, 1856.
  • Cowles, R., A Pocket Dictionary of Cantonese, 1914.
  • Meyer, B. and Wempe, T., The Student's Cantonese-English Dictionary, 3rd edition, 1947.
  • Chao, Y. Cantonese Primer, 1947.

The sibilants depalatalized, causing many words that were once distinct to sound the same. For comparison, modern Standard Mandarin still has this distinction, with most Cantonese alveolo-palatal sibilants corresponding to Mandarin retroflex sibilants. For instance:

Sibilant Category Character Modern Cantonese Pre-1950s Cantonese Standard Mandarin
Unaspirated affricate /tsœːŋ/ (alveolar) /tsœːŋ/ (alveolar) /tɕiɑŋ/ (alveolo-palatal)
/tɕœːŋ/ (alveolo-palatal) /tʂɑŋ/ (retroflex)
Aspirated affricate /tsʰœːŋ/ (alveolar) /tsʰœːŋ/ (alveolar) /tɕʰiɑŋ/ (alveolo-palatal)
/tɕʰœːŋ/ (alveolo-palatal) /tʂʰɑŋ/ (retroflex)
Fricative /sœːŋ/ (alveolar) /sœːŋ/ (alveolar) /ɕiɑŋ/ (alveolo-palatal)
/ɕœːŋ/ (alveolo-palatal) /ʂɑŋ/ (retroflex)

Even though the aforementioned references observed the distinction, most of them also noted that the depalatalization phenomenon was already occurring at the time. Williams (1856) writes:

The initials ch and ts are constantly confounded, and some persons are absolutely unable to detect the difference, more frequently identifying the words under ts as ch, than contrariwise.

Cowles (1914) adds:

"s" initial may be heard for "sh" initial and vice versa.

A vestige of this palatalization difference is sometimes reflected in the romanization scheme used to romanize Cantonese names in Hong Kong. For instance, many names are spelled with sh even though the "sh sound" (/ɕ/) is no longer used to pronounce the word. Examples include the surname (/sɛːk˨/), which is often romanized as Shek, and the names of places like Sha Tin (沙田; /saː˥ tʰiːn˩/).

The alveolo-palatal sibilants occur in complementary distribution with the retroflex sibilants in Mandarin, with the alveolo-palatal sibilants only occurring before /i/ or /y/. However, Mandarin also retains the medials, where /i/ and /y/ can occur, as can be seen in the examples above. Cantonese had lost its medials sometime ago in its history, reducing the ability for speakers to distinguish its sibilant initials.

Many modern-day younger Hong Kong speakers do not distinguish between phoneme pairs like /n/ vs. /l/ and /ŋ/ vs. null initial[2] and merge one sound into another. Examples for this include /nei˨˧/ being pronounced as /lei˨˧/, /ŋɔː˨˧/ being pronounced as /ɔː˨˧/. Another incipient sound change is the lost distinctions /kʷ/ vs. /k/ and /kʷʰ/ vs. /kʰ/, for example /kʷɔːk˧/ being pronounced as [kɔːk̚˧].[18] Although that is often considered substandard and denounced as "lazy sounds/pronunciation" (懶音), it is becoming more common and is influencing other Cantonese-speaking regions (see Hong Kong Cantonese).[citation needed]

Assimilation also occurs in certain contexts: 肚餓 is sometimes read as [tʰoŋ˩˧ ŋɔː˨] not [tʰou̯˩˧ ŋɔː˨], 雪櫃 is sometimes read as [sɛːk˧ kʷɐi̯˨] not [syːt˧ kʷɐi̯˨], but sound change of these morphemes are limited to that word.[citation needed]

See also[edit]


  1. ^ "WALS Online - Chapter Syllable Structure".
  2. ^ a b Yip & Matthews (2001:3–4)
  3. ^ Lee, W.-S.; Zee, E. (2010). "Articulatory characteristics of the coronal stop, affricate, and fricative in Cantonese". Journal of Chinese Linguistics. 38 (2): 336–372. JSTOR 23754137.
  4. ^ Bauer & Benedict (1997:28–29)
  5. ^ Zee, Eric (2003), "Frequency Analysis of the Vowels in Cantonese from 50 Male and 50 Female Speakers" (PDF), Proceedings of the 15th International Congress of Phonetic Sciences: 1117–1120
  6. ^ Bauer & Benedict (1997:49)
  7. ^ a b "Cantonese Transcription Schemes Conversion Tables - Finals". Research Centre for Humanities Computing, The Chinese University of Hong Kong. Retrieved March 5, 2019.
  8. ^ a b Zee, Eric (1999), "An acoustical analysis of the diphthongs in Cantonese" (PDF), Proceedings of the 14th International Congress of Phonetic Sciences: 1101–1105
  9. ^ Bauer & Benedict (1997:60)
  10. ^ Bauer & Benedict (1997:119–120)
  11. ^ Guan (2000:474 and 530)
  12. ^ Jennie Lam Suk Yin, 2003, Confusion of tones in visually-impaired children using Cantonese braille(Archived by WebCite® at
  13. ^ Norman (1988:216)
  14. ^ Ting (1996:150)
  15. ^ Matthews & Yip (2013, section 1.4.2)
  16. ^ Yu (2007:191)
  17. ^ Alan C.L. Yu. "Tonal Mapping in Cantonese Vocative Reduplication" (PDF). Retrieved 27 September 2014.
  18. ^ Baker & Ho (2006:xvii)