Standard Chinese phonology

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This article summarizes the phonology (the sound system, or in more general terms, the pronunciation) of Standard Chinese (Standard Mandarin).

Standard Chinese is based on the Beijing dialect of Mandarin. Actual production varies widely among speakers, as they inadvertently introduce elements of their native dialects (although television and radio announcers are chosen for their pronunciation accuracy and standard accent). Elements of the sound system include not only the segments – the vowels and consonants of the language – but also the tones that are applied to each syllable. Standard Chinese has four main tones, in addition to a neutral tone used on weak syllables.

This article represents phonetic values using the International Phonetic Alphabet (IPA), noting correspondences chiefly with the pinyin system for transcription of Chinese text. For correspondences with other systems, see the relevant articles, such as Wade–Giles, bopomofo (zhuyin), Gwoyeu Romatzyh, etc., and Romanization of Chinese.


The following table shows the consonant sounds of Standard Chinese, transcribed using the International Phonetic Alphabet. The sounds shown in parentheses are frequently not analyzed as separate phonemes; for more on these, see Palatal series and Glides, below. Excluding these, there are 19 consonant phonemes in the inventory.

Labial Denti-
Retroflex Alveolo-
Nasal m n ŋ
Stop p t k
Affricate t͡s t͡sʰ  ʈ͡ʂ ʈ͡ʂʰ (t͡ɕ) (t͡ɕʰ)
Fricative f s ʂ (ɕ) x
Approximant l ɻ (j) (ɥ) (w)

Between pairs of stops or affricates having the same place and manner of articulation, the primary distinction is not voiced vs. voiceless (as in French), but unaspirated vs. aspirated (as in Icelandic). The unaspirated stops and affricates may however become voiced in weak syllables (see Syllable reduction, below). Such pairs are represented in the pinyin system mostly using letters which (in European languages) principally denote voiceless/voiced pairs, with the "voiceless" letter representing an aspirated sound, and the "voiced" letter an unaspirated sound – for example, pinyin p and b represent respectively /pʰ/ and /p/ (aspirated and unaspirated "p" sounds).

More details about the individual consonant sounds are given in the following table.

Phoneme or sound Approximate description Pinyin Notes
/m/ Like English "m" m
/n/ Like English "n" n See Denti-alveolar and retroflex series. Can occur in the onset and/or coda of a syllable.
/ŋ/ Like "ng" in English "sing" ng Occurs only in the syllable coda.
/p/ Like English "p" but unaspirated - as in "spy" b
// Like an aspirated English "p", as in "pie" p
/t/ Like English "t" but unaspirated - as in "sty" d See Denti-alveolar and retroflex series.
// Like an aspirated English "t", as in "tie" t See Denti-alveolar and retroflex series.
/k/ Like English "k", but unaspirated, as in "ski" g
// Like an aspirated English "k", as in "key" k
/t͡s/ Like English "ts" in "cats", without aspiration z See Denti-alveolar and retroflex series.
/t͡sʰ/ As above, but with aspiration c See Denti-alveolar and retroflex series.
/ʈ͡ʂ/ Similar to "ch" in English "chat", but with a retroflex articulation and no aspiration zh See Denti-alveolar and retroflex series.
/ʈ͡ʂʰ/ As above, but with aspiration ch See Denti-alveolar and retroflex series.
[t͡ɕ] Like an unaspirated English "ch", but with a palatal (softer) pronunciation j See Palatal series.
[t͡ɕʰ] As above, with aspiration q See Palatal series.
/f/ Like English "f" f
/s/ Like English "s", but usually with the tongue on the lower teeth. s See Denti-alveolar and retroflex series.
/ʂ/ Similar to English "sh", but with a retroflex articulation sh See Denti-alveolar and retroflex series.
[ɕ] Similar to English "sh", but with a palatal (softer) pronunciation x See Palatal series.
/x/ As "ch" in Scottish "loch"; by some speakers [h], as "h" in English "hat"[1] h
/l/ Like English clear "l", as in RP "lay" (never dark, i.e. velarized) l See Denti-alveolar and retroflex series.
/ɻ/ In syllable-initial position, pronounced by some speakers as [ɻ]
(a retroflex approximant, similar to initial "r" in some English varieties), and by some as [ʐ] (a Voiced retroflex sibilant, closer to the middle consonant sound of "vision")
r For pronunciation in syllable-final position, see Rhotic coda.
[j] Like "y" in English "you" y, medial i See Glides.
[ɥ] Like "hu" in French "huit" yu, medial ü and sometimes u See Glides.
[w] Like "w" in English "wet" w, medial u See Glides.

All of the consonants may occur as the initial sound of a syllable, with the exception of /ŋ/ (unless the zero initial is assigned to this phoneme; see below.) The glides [j], [ɥ], [w] may also be medials (coming between the initial consonant and the main vowel). The only consonants that can appear in syllable coda (final) position are /n/, /ŋ/, and /ɻ/ (although [m] may occur as an allophone of /n/ before labial consonants in fast speech, and in some descriptions the second elements of diphthongs are identified with the glides). Final /n/, /ŋ/ may be pronounced without complete oral closure, resulting in a syllable that in fact ends with a long nasalized vowel.[2] See also Syllable reduction, below.

Denti-alveolar and retroflex series[edit]

The consonants listed in the first table above as denti-alveolar are sometimes described as alveolars, and sometimes as dentals. The affricates and fricative are particularly often described as dentals; these are generally pronounced with the tongue on the lower teeth.[3]

The retroflex consonants (like those of Polish) are produced with the tongue blade rather than the underside of the tongue-tip, and so are considered by some authors not to be truly retroflex; they may be more accurately called laminal post-alveolar.[4] Speakers not from Beijing often lack the retroflexes in their native dialects, and may thus replace them with dentals.[5]

Palatal series[edit]

The palatal consonants [t͡ɕ, t͡ɕʰ, ɕ] (pinyin j, q, x) are pronounced by some speakers as palatalized dentals [t͡sʲ], [t͡sʰʲ], [sʲ]. This is especially common among children and females.[6]

In phonological analysis, it is often assumed that, when not followed by one of the high front vowels [i] or [y], the palatals consist of a consonant followed by a palatal glide ([j] or [ɥ]). That is, syllables represented in pinyin as beginning ji-, qi-, xi-, ju-, qu-, xu- (followed by a vowel) are taken to begin [t͡ɕ]+[j], [t͡ɕʰ]+[j], [ɕ]+[j], [t͡ɕ]+[ɥ], [t͡ɕʰ]+[ɥ], [ɕ]+[ɥ]. The actual pronunciations are more like [t͡ɕ], [t͡ɕʰ], [ɕ], [t͡ɕʷ], [t͡ɕʰʷ], [ɕʷ] (or for speakers using the dental variants, [t͡sʲ], [t͡sʰʲ], [sʲ], [t͡sɥ], [t͡sʰɥ], [sɥ]). This is consistent with the general observation (see next section) that medial glides are realized as palatalization and/or velarization of the preceding consonant (palatalization already being inherent in the case of the palatals).

On the above analysis, the palatals are in complementary distribution with the dentals [t͡s, t͡sʰ, s], with the velars [k, kʰ, x], and with the retroflexes [ʈ͡ʂ, ʈ͡ʂʰ, ʂ], as none of these can occur before high front vowels or palatal glides, whereas the palatals occur only before high front vowels or palatal glides. Therefore, linguists often prefer to classify [t͡ɕ, t͡ɕʰ, ɕ] not as independent phonemes, but as allophones of one of the other three series.[7] The existence of the above-mentioned dental variants inclines some to prefer to identify the palatals with the dentals, but identification with any of the three series is possible (unless the empty rime is identified with /i/, in which case the velars become the only candidate; see below). The Yale and Wade–Giles systems mostly treat the palatals as allophones of the retroflexes; Tongyong Pinyin mostly treats them as allophones of the dentals; and Chinese braille treats them as allophones of the velars. In standard pinyin and bopomofo, however, they are represented as a separate sequence.

The palatals arose historically from a merger of the dentals [t͡s, t͡sʰ, s] and velars [k, kʰ, x] before high front vowels and glides. Previously, some instances of modern [t͡ɕ(ʰ)i] were instead [k(ʰ)i], and others were [t͡s(ʰ)i] . The change took place in the last two or three centuries at different times in different areas, but not in the dialect used in the Manchu dynasty imperial court. This explains why some European transcriptions of Chinese names (especially in the postal map spelling) contain "ki-", "hi-", "tsi-" or "si-" where a palatal might be expected. Examples are "Peking" for Beijing, "Chungking" for Chongqing, "Fukien" for Fujian, "Tientsin" for Tianjin; "Sinkiang" for Xinjiang, and "Sian" for Xi'an. The complementary distribution with the retroflex series arose when syllables that had a retroflex consonant followed by a medial glide lost the medial glide.


The glides [j], [ɥ] and [w] sound respectively like the "y" in English "yes", the "(h)u" in French "huit", and the "w" in English "we". (Beijing speakers often replace initial [w] with a labiodental [ʋ], except when it is followed by [ɔ].[8]) The glides are commonly analyzed not as independent phonemes, but as consonantal allophones of the high vowels /i/, /y/ and /u/. This is possible since there is no ambiguity in interpreting a sequence like [jɑʊ̯] (pinyin yao/-iao) as /iau/, and potentially problematic sequences such as */iu/ do not occur.

The glides may occur in initial position in a syllable. This occurs with [ɥ] in the syllables written yu, yuan, yue, yun and yong in pinyin; with [j] in other syllables written with initial y in pinyin (ya, yi, etc.); and with [w] in syllables written with initial w in pinyin (wa, wu, etc.). When a glide is followed by the vowel of which that glide is considered an allophone, the glide may be regarded as epenthetic (automatically inserted), and not as a separate realization of the phoneme. Hence the syllable yi, pronounced [ji], may be analyzed as consisting of the single phoneme /i/, and similarly yin may be analyzed as /in/, yu as /y/, and wu as /u/.[9]

The glides can also occur in medial position, that is, after the initial consonant but before the main vowel. Here they are represented in pinyin as vowels: for example, the i in bie represents [j], and the u in duan represents [w]. There are some restrictions on the possible consonant-glide combinations: [w] does not occur after labials (except for some speakers in bo, po, mo, fo); [j] does not occur after retroflexes and velars (or after [f]); and [ɥ] occurs medially only in lüe and nüe and after palatals (for which see above.) A consonant-glide combination at the start of a syllable is articulated as a single sound – the glide is not in fact pronounced after the consonant, but is realized as palatalization [ʲ], labialization [ʷ], or both [ɥ], of the consonant.[10] (The same modifications of initial consonants occur in syllables where they are followed by a high vowel, although normally no glide is considered to be present there. Hence a consonant is generally palatalized [ʲ] when followed by /i/, labialized [ʷ] when followed by /u/, and both [ɥ] when followed by /y/.)

Non-syllabic forms of the vowels /i/ and /u/ are also found as the final element in some syllables, i.e. as the second element of a diphthong. These are notated sometimes as [j] and [w], but often rather as [ɪ̯] and [ʊ̯]. These cases are discussed below under Vowels.

Zero onset[edit]

A full syllable such as ai, in which the vowel is not preceded by any of the standard initial consonants or glides, is said to have a null initial or zero onset. This may be realized as a consonant sound: [ɣ], [ʔ], [ŋ] and [ɦ] are possibilities, and it has been suggested that such an onset be regarded as a special phoneme, or as an instance of the phoneme /ŋ/, although it can also be treated as no phoneme (absence of onset). By contrast, in the case of the particle 啊 a, which is a weak onsetless syllable, linking occurs with the previous syllable (as described under Syllable reduction, below).[11]

Rhotic coda[edit]

Main article: Erhua

Standard Chinese features syllables that end with a rhotic coda ("r"). This feature, known in Chinese as erhua, is particularly characteristic of the Beijing dialect; many other dialects do not use it as much, and some not at all.[12] It occurs in two cases:

  1. In a small number of independent words or morphemes pronounced [əɻ ] or [ɑɻ ], written in pinyin as er (with some tone), such as 二 èr "two", 耳 ěr "ear", and 儿 (traditional 兒) ér "son".
  2. In syllables in which the rhotic coda is added as a suffix to another morpheme. This suffix is represented by the character 儿 [兒] ("son"), to which meaning it is historically related, and in pinyin as r. The suffix combines with the final sound of the syllable, and regular but complex sound changes occur as a result (described in detail under erhua).

The "r" final can be analyzed as representing the same phoneme as the initial [ɻ~ʐ] (which is also written r in pinyin). However, the final sound is pronounced with a relatively lax tongue, and has been described as a "retroflex vowel".[13]

In dialects that do not make use of the rhotic coda, it may be omitted in pronunciation, or in some cases a different word may be selected: for example, Beijing 这儿 zhèr "here" and 那儿 nàr "there" may be replaced by the synonyms 这里 zhèli and 那里 nàli.

Syllabic consonants[edit]

The syllables written in pinyin as zi, ci, si, zhi, chi, shi, ri may be described as having a syllabic consonant in place of a vowel (syllabic [z] in the first three cases; syllabic [ɻ] in the others). For more analysis see below.

Syllabic consonants may also arise as a result of weak syllable reduction; see below. Syllabic nasal consonants are also heard in certain interjections; pronunciations of such words include [m], [n], [ŋ], [hm], [hŋ].


Monophthongs of Mandarin Chinese as they are pronounced in Beijing (from Lee & Zee (2003:110)).
Part 1 of Mandarin Chinese diphthongs as they are pronounced in Beijing (from Lee & Zee (2003:110)).
Part 2 of Mandarin Chinese diphthongs as they are pronounced in Beijing (from Lee & Zee (2003:110)).

Standard Chinese can be analyzed as having five or six vowel phonemes: /a/, /ǝ/, /i/, /u/, /y/, and according to some analyses also /ɨ/. (For discussion of possible analyses, including some with even smaller numbers of vowels, see below.) The vowel /a/ is a low (open) vowel, /ǝ/ is a mid vowel, and /i/, /u/ and /y/ are high (close) vowels.

The precise realization of each vowel depends on its phonetic environment. In particular, the vowel /ə/ has two broad allophones [e] and [o] (corresponding respectively to pinyin e and o in most cases). These sounds can be treated as a single underlying phoneme because they are in complementary distribution. (Apparent counterexamples are provided by certain interjections, such as [ɔ], [ɛ], [jɔ], and [lɔ], but these are normally treated as special cases operating outside the normal phonemic system.[14])

Many Chinese syllables contain diphthongs. These are commonly analyzed as two-phoneme sequences, the second phoneme being either /i/ or /u/. For example, the syllable bai, pronounced [paɪ̯], is assigned the underlying representation /pai/. (In pinyin, the second element is generally written i or u, but /au/ is written ao.)


Narrow transcriptions of the vowels' allophones (the ways they are pronounced in particular phonetic environments) differ somewhat between sources. The following table provides one fairly typical set of descriptions (not including the values that occur with the rhotic coda).[15]

Phoneme Allophone Phonetic contexts Description Pinyin contexts
(pinyin a)
[ä] Syllable-final Open central vowel (-)a [ä], ya/-ia [jä], wa/-ua [wä]
[a] Before [n] (when not preceded by a palatal), or in diphthong with [ɪ̯] Open front vowel; the ai diphthong is the same as English "eye" (-)an [an], wan/-uan [wan] (not after i/j/q/x/y);
(-)ai [aɪ̯], wai/-uai [waɪ̯]
[ɑ] Before [ŋ], or in diphthong with [ʊ̯] Open back vowel; the ao diphthong is similar to "ow" in English "cow" (-)ang [ɑŋ], yang/-iang [jɑŋ], wang/-uang [wɑŋ];
(-)ao, [ɑʊ̯], yao/-iao [jɑʊ̯]
[æ] Before [n] in a syllable that begins with a palatal or includes a palatal glide Similar to English "a" in "trap". Analyzed as underlying /a/ rather than /ǝ/ because it becomes [ɑ] when a rhotic coda is added. yan/-ian [jæn], yuan and -uan after j/q/x [ɥæn]
(mostly pinyin
e and o)
[e] In diphthong with [ɪ̯] (note wei can become [wi] in the first or second tone) Close-mid front vowel; the diphthong is similar to "ey" in English "hey" -ei [eɪ̯], wei/-ui [weɪ̯]
[ɛ] Syllable-final, preceded by [j] (or palatals) Like in English "pet" (but less open than the allophone of /a/ described above) ye/-ie [jɛ]
[œ] Syllable-final, preceded by [ɥ] (or labialized palatals) As previous but with lip rounding yue/-üe and -ue after j/q/x [ɥœ]
[ɔ] Syllable-final, preceded by [w] Like British English "awe" wo/-uo [wɔ], also in bo, po, mo, fo ([pwɔ] etc.)
[ɤ] Syllable-final, not preceded by a glide or palatal As previous, but without lip rounding (-)e [ɤ] (by some speakers [ɰʌ])
[ə] Before [n] (note wen can become [wn̩] in the first or second tone) or [ŋ], or syllable-final in weak syllable not preceded by a glide or palatal Schwa, like English "a" as in "about" -en [ən], wen/-un [wən];
-eng [əŋ], weng [wəŋ];
(-)e [ə]
[o] In diphthong with [u̯] (note you may become [ju] in the first or second tone) The diphthong is similar to American English "o" as in "no" (-)ou [oʊ̯], you/-iu [joʊ̯]
(pinyin i, y)
[i] As main vowel of syllable Similar to the vowel of English "beat" yi/-i [(j)i] (except after z/c/s/zh/ch/sh/r), yin/-in [(j)in], ying/-ing [(j)iŋ]
[j] As syllable initial or medial (see Glides) Like English "y" as in "yes" y-, -i- [j]
[ɪ̯] As syllable coda (see the note on diphthongs, above) Non-syllabic vowel, like English "y" in "hey" -i [ɪ̯]
(pinyin ü,
sometimes u)
[y] As main vowel of syllable Like French u or German ü yu/-ü or -u after j/q/x [(ɥ)y], yun or -un after j/q/x [(ɥ)yn]
[ɥ] As syllable initial or medial (see Glides) Like French "(h)u" in "huit" yu-, -u- after j/q/x [ɥ]
(pinyin u, w,
and o before ng)
[u] As main vowel of syllable Similar to the vowel of English "boot" wu/-u [(w)u] (except after i/y/j/q/x)
[ʊ] Before [ŋ] Like in English "put" -ong [ʊŋ], yong/-iong [jʊŋ]
[w] As syllable initial or medial (see Glides) Like English "w" in "wet" w-, -u- [w]
[ʊ̯] As syllable coda (see the note on diphthongs, above) Non-syllabic vowel, like English "w" in "low" -u after o or -o after a [ʊ̯]
/ɨ/ [ɨ] Not always considered an independent phoneme. See below. -i in zhi, chi, shi, ri.
[ɯ] -i in zi, ci, si.

As a general rule, vowels in open syllables (those which have no coda following the main vowel) are pronounced long, while others are pronounced short. This does not apply to weak syllables, in which all vowels are short.[16]

The vowel ɨ or empty rime [edit]

The sound of the nucleus of the pinyin syllables zi, ci, si, zhi, chi, shi, ri is variously described. If described as a vowel, it may be specified as:

  • [ɯ], like [u] without lip rounding, in zi, ci, si;
  • [ɨ], similar to Russian ы, in zhi, chi, shi, ri.

Alternatively, the nucleus may be described not a vowel, but as a syllabic consonant: a syllabic [z] in the syllables zi, ci, si, and a syllabic [ɻ] in zhi, chi, shi, ri.[17]

Phonologically, these syllables may be analyzed as having their own vowel phoneme, /ɨ/. However, it is possible to merge this with the phoneme /i/ (with which it is historically related), since the two are in complementary distribution – provided that the palatal series is either left unmerged, or is merged with the velars rather than the retroflex or alveolar series. (That is, [t͡ɕi], [t͡sɯ] and [ʈ͡ʂɨ] all exist, but there is neither *[ki] nor *[kɨ], so there is no problem merging both [i]~[ɨ] and [k]~[t͡ɕ] at the same time.)

Another approach is to regard the syllables assigned above to /ɨ/ as having (underlyingly) an empty nuclear slot ("empty rime", Chinese 空韵 kōngyùn), i.e. as not containing a vowel phoneme at all. This is more consistent with the syllabic consonant description of these syllables.

Alternative analyses[edit]

If all the mergers considered above are accepted, the result is a system with 19 consonant phonemes and 5 vowel phonemes.

Some linguists prefer to reduce the number of vowel phonemes still further (at the expense of including underlying glides in their systems). Edwin Pulleyblank has proposed a system which includes underlying glides, but no vowels at all.[18] More common are systems with two vowels; for example, in Mantaro Hashimoto's system,[19] there are just two vowel nuclei, /a/ and /ə/, which may be preceded by a glide /j/, /w/ or /ɥ/, and may be followed by a coda /i~j/, /u~w/, /n/ or /ŋ/ (additional sequences are afforded by the rhotic coda /ɻ /; see Erhua). The various glide+vowel+coda combinations have different surface manifestations, as shown in the table below (note that the phonetic interpretations shown here may differ slightly from those given in the allophones table above). Any of the three positions may be empty, i.e. occupied by a null meta-phoneme Ø; for example, the high vowels [i], [u] and [y] are analyzed as glide+Ø, and the vowel [ɨ] or empty rime is analyzed as having all three values null, e.g. si is analyzed as an underlying syllabic /s/.

Nucleus Coda Medial
Ø j w ɥ
a Ø ä
i aɪ̯ waɪ̯
u ɑʊ̯ jɑʊ̯
n an jæn wan ɥæn
ŋ ɑŋ jɑŋ wɑŋ
ə Ø ɤ ɥœ
i eɪ̯ weɪ̯
u oʊ̯ joʊ̯
n ən in wən yn
ŋ əŋ ʊŋ
~ wəŋ

(after zero onset)
Ø ɨ~ɯ i u y


Syllables in Standard Chinese have the maximal form CGVXT, where C is the initial consonant; G is one of the glides [j], [w], [ɥ]; V is a vowel; X is a coda which may be one of [n], [ŋ], [ɻ], [i̯], [u̯]; and T is the tone. Any of C, G and X (and V, in some analyses) may be absent. C is called the "initial", G the "medial", and VXT the "final" or "rime"; sometimes the medial is considered part of the rime.

Many of the possible combinations under the above scheme do not actually occur. There are only some 35 final combinations (medial+rime) in actual syllables (see pinyin finals). In all, there are only about 400 different syllables when tone is ignored, and about 1300 when tone is included. This is a far smaller number of distinct syllables than in a language such as English. Since Chinese syllables usually constitute whole words, or at least morphemes, the smallness of the syllable inventory results in large numbers of homophones.

For a list of all Standard Chinese syllables (excluding tone and rhotic coda) see the pinyin table or zhuyin table.

Full and weak syllables[edit]

Syllables can be classified as full (or strong), and weak. Weak syllables are usually grammatical markers such as 了 le, or the second syllables of some compound words (although many other compounds consist of two or more full syllables).

A full syllable carries one of the four main tones, and some degree of stress. Weak syllables are unstressed, and have neutral tone. The contrast between full and weak syllables is distinctive; there are many minimal pairs such as 要是 yàoshì "if" and 钥匙 yàoshi "key", or 大意 dà yì "main idea" and (with the same characters) dàyi "careless", the second word in each case having a weak second syllable. Some linguists consider this contrast to be primarily one of stress, while others regard it as one of tone. For further discussion, see under Neutral tone and Stress, below.

There is also a difference in syllable length. Full syllables can be analyzed as having two morae ("heavy"), the vowel being lengthened if there is no coda. Weak syllables, however, have a single mora ("light"), and are pronounced approximately 50% shorter than full syllables.[20] Any weak syllable will usually be an instance of the same morpheme (and written with the same character) as some corresponding strong syllable; the weak form will often have a modified pronunciation, however, as detailed in the following section.

Syllable reduction[edit]

Apart from differences in tone, length and stress, weak syllables are subject to certain other pronunciation changes (reduction).[21]

  • If a weak syllable begins with an unaspirated obstruent, such as (pinyin) b, d, g, z, j, that consonant may become voiced. For example, in 嘴巴 zuǐba ("mouth"), the second syllable is likely to begin with a [b] sound, rather than an unaspirated [p].
  • The vowel of a weak syllable is often reduced, becoming more central. For example, in the word zuǐba just mentioned, the final vowel may become a schwa [ə].
  • The coda (final consonant or offglide) of a weak syllable is often dropped (this is linked to the shorter, single-mora nature of weak syllables, as referred to above). If the dropped coda was a nasal consonant, the vowel may be nasalized.[22] For example, 脑袋 nǎodai ("head") may end with a monophthong [ɛ] rather than a diphthong, and 春天 chūntian ("spring") may end with a centralized and nasalized vowel [ə᷉].
  • In some cases, the vowel may be dropped altogether. This may occur, particularly with high vowels, when the unstressed syllable begins with a fricative or an aspirated consonant; for example, 豆腐 dòufu ("tofu") may be said as dòu-f, and 问题 wènti ("question") as wèn-t (the remaining initial consonant is pronounced as a syllabic consonant). The same may even occur in full syllables that have low ("half-third") tone.[23] The vowel (and coda) may also be dropped after a nasal, in such words as 我们 wǒmen ("we") and 什么 shénme ("what"), which may be said as wǒm and shém – these are examples of the merger of two syllables into one, which occurs in a variety of situations in connected speech.

The example of shénme → shém also involves assimilation, which is heard even in unreduced syllables in quick speech (for example, in guǎmbō for 广播 guǎngbō "broadcast"). A particular case of assimilation is that of the sentence-final exclamatory particle 啊 a, a weak syllable, which has different characters for its assimilated forms:

Preceding sound Form of particle (pinyin) Character
[ŋ], [ɨ] a
[i], [y], [e], [o], [a] ya (from ŋi̯a)
[u] wa
[n] na
le (grammatical
combines to form la


Relative pitch changes of the four full tones

Standard Chinese, like all Chinese dialects, is a tonal language. This means that in addition to consonants and vowels, the pitch contour of a syllable is used to distinguish words from each other. Many non-native Chinese speakers have difficulties mastering the tones of each character, but correct tonal pronunciation is essential for intelligibility because of the vast number of words in the language that only differ by tone (i.e. are minimal pairs with respect to tone). Statistically, tones are as important as vowels in Standard Chinese.[24]

Tonal categories[edit]

The following table shows the four main tones of Standard Chinese, together with the neutral (or fifth) tone.

Tone number 1 2 3 4 5
Description high rising low/dipping falling neutral
Pinyin diacritic ā á ǎ à a
Tone letter ˥˥ (55) ˧˥ (35) ˨˩, ˨˩˦ (21, 214) ˥˩ (51) -
IPA diacritic á ǎ à, a᷉ â -
Tone name yīn píng yáng píng shǎng qīng shēng
The four main tones of Standard Mandarin, pronounced with the syllable ma.

The Chinese names of the main four tones are respectively 阴平 [陰平] yīn píng ("dark level"), 阳平 [陽平] yáng píng ("light level"), 上 shǎng[25][26] ("rising"), and 去 ("departing"). As descriptions, they apply rather to the predecessor Middle Chinese tones than to the modern tones; see below. The modern Standard Chinese tones are produced as follows:

  1. First tone, or high-level tone, is a steady high sound, produced as if it were being sung instead of spoken. (In a few syllables the quality of the vowel is changed when it carries first tone; see the vowel table, above.)
  2. Second tone, or rising tone, or more specifically high-rising, is a sound that rises from middle to high pitch (like in the English "What?!"). In a three-syllable expression, if the first syllable has first or second tone and the final syllable is not weak, then a second tone on the middle syllable may change to first tone.[27]
  3. Third tone, low or dipping tone, descends from mid-low to low; between other tones it may simply be low. This tone is often demonstrated as having a rise in pitch after the low fall; however, when a third-tone syllable is not said in isolation, this rise is normally heard only if it appears at the end of a sentence or before a pause, and then usually only on stressed monosyllables.[28] The third tone without the rise is sometimes called half third tone. Third tone syllables that include the rise are significantly longer than other syllables. For further variation in syllables carrying this tone, see Third tone sandhi, below. Unlike the other tones, third tone is pronounced with breathiness or murmur.[29]
  4. Fourth tone, falling tone, or high-falling, features a sharp fall from high to low (as is heard in curt commands in English, such as "Stop!"). When followed by another fourth-tone syllable, the fall may be only from high to mid-level.[30]
  5. For the neutral tone or fifth tone, see the following section.

Most romanization systems, including pinyin, represent the tones as diacritics on the vowels (as does zhuyin), although some, like Wade–Giles, use superscript numbers at the end of each syllable. The tone marks and numbers are rarely used outside of language textbooks: in particular, they are usually absent in public signs, company logos, and so forth. Gwoyeu Romatzyh is a rare example of a system where tones are represented using normal letters of the alphabet (although without a one-to-one correspondence).

Neutral tone[edit]

Also called fifth tone or zeroth tone (in Chinese 轻声 [輕聲] qīng shēng, literal meaning: "light tone"), neutral tone is sometimes thought of as a lack of tone. It is associated with weak syllables, which are generally somewhat shorter than tonic syllables. The pitch of a syllable with neutral tone is determined by the tone of the preceding syllable. The following table shows the pitch at which the neutral tone is pronounced in Standard Chinese after each of the four main tones.[31] The situation differs by dialect, and in some regions, notably Taiwan, the neutral tone is relatively uncommon.

Realization of neutral tones
Tone of preceding syllable Pitch of neutral tone[32]
(5=high, 1=low)
Example Pinyin Meaning Overall
tone pattern[32]
First ˥ ˨ ( ) 2 玻璃 bōli glass ˥.˨ ( ˥꜋ )
Second ˧˥ ˧ ( ) 3 伯伯 bóbo uncle ˧˥.˧ ( ˧˥꜊ )
Third ˨˩ ˦ ( ) 4 喇叭 lǎba horn ˨˩.˦ ( ˨˩꜉ )
Fourth ˥˩ ˩ ( ) 1 兔子 tùzi rabbit ˥˩.˩ ( ˥˩꜌ )

Although the contrast between weak and full syllables is often distinctive, the neutral tone is often not described as a full-fledged tone; some linguists feel that it results from a "spreading out" of the tone on the preceding syllable. This idea is appealing because without it, the neutral tone needs relatively complex tone sandhi rules to be made sense of; indeed, it would have to have four allotones, one for each of the four tones that could precede it. However, the "spreading" theory incompletely characterizes the neutral tone, especially in sequences where more than one neutral-tone syllable is found adjacent.[33]

Relationship between Middle Chinese and modern tones[edit]

The four tones of Middle Chinese are not in one-to-one correspondence with the modern tones. The following table shows the development of the traditional tones as reflected in modern Standard Chinese. The development of each tone depends on the initial consonant of the syllable: whether it was a voiceless consonant (denoted in the table by v−), a voiced obstruent (v+), or a sonorant (s). (The voiced–voiceless distinction has been lost in modern Standard Chinese.)

Middle Chinese Tone Ping (平) Shang (上) Qu (去) Ru (入)
Initial v− s v+ v− s v+ v− s v+ v− s v+
Standard Chinese Tone name Yin Ping
Yang Ping
with no
Yang Ping
Tone contour 55 35 21(4) 51 51 35

Tone sandhi[edit]

Pronunciation also varies with context according to the rules of tone sandhi. Some such changes have been noted above in the descriptions of the individual tones; however, the most prominent phenomena of this kind relate to consecutive sequences of third-tone syllables. There are also a few common words that have variable tone.

Third tone sandhi[edit]

The principal rule of third tone sandhi is:

  • When there are two consecutive third-tone syllables, the first of them is pronounced with second tone.

For example, lǎoshǔ ("mouse") comes to be pronounced láoshǔ [lɑʊ̯˧˥ʂu˨˩]. It has been investigated whether the rising contour (˧˥) on the prior syllable is in fact identical to a normal second tone; it has been concluded that it is, at least in terms of auditory perception.[34]

When there are three or more third tones in a row, the situation becomes more complicated, since a third tone that precedes a second tone resulting from third tone sandhi may or may not be subject to sandhi itself. The results may depend on word boundaries, stress, and dialectal variations. General rules for three-syllable third-tone combinations can be formulated as follows:

  1. If the first word is two syllables and the second word is one syllable, then the first two syllables become second tones. For example, bǎoguǎn hǎo takes the pronunciation báoguán hǎo [pɑʊ̯˧˥ku̯an˧˥xɑʊ̯˨˩˦].
  2. If the first word is one syllable, and the second word is two syllables, the second syllable becomes second tone, but the first syllable remains third tone. For example: lǎo bǎoguǎn takes the pronunciation lǎo báoguǎn [lɑʊ̯˨˩pɑʊ̯˧˥ku̯an˨˩˦].

Some linguists have put forward more comprehensive systems of sandhi rules for multiple third tone sequences. For example, it is proposed[35] that modifications are applied cyclically, initially within rhythmic feet (trochees; see below), and that sandhi "need not apply between two cyclic branches."

Tones on special syllables[edit]

Special rules apply to the tones heard on the words (or morphemes) ("not") and ("one").

For 不 :

  1. 不 is pronounced with second tone when followed by a fourth tone syllable.
    Example: 不是 (+shì) becomes búshì [pu˧˥ʂɨ˥˩]
  2. In other cases, 不 is pronounced with fourth tone. However, when used between words in an A-not-A question, it may become neutral in tone (e.g. 是不是 shìbushì).

For 一 :

  1. 一 is pronounced with second tone when followed by a fourth tone syllable.
    Example: 一定 (+dìng) becomes yídìng [i˧˥tiŋ˥˩]
  2. Before a first, second or third tone syllable, 一 is pronounced with fourth tone.
    Examples:一天 (+tiān) becomes yìtiān [i˥˩tʰi̯ɛn˥], 一年 (+nián) becomes yìnián [i˥˩ni̯ɛn˧˥], 一起 (+) becomes yìqǐ [i˥˩t͡ɕʰi˨˩˦].
  3. When final, or when it comes at the end of a multi-syllable word (regardless of the first tone of the next word), 一 is pronounced with first tone. It also has first tone when used as an ordinal number (or part of one), and when it is immediately followed by any digit (including another 一; hence both syllables of the word 一 一 yīyī and its compounds have first tone).
  4. When 一 is used between two reduplicated words, it may become neutral in tone (e.g. 看一看 kànyikàn).

The numbers 七 ("seven") and 八 ("eight") sometimes display similar tonal behavior as 一 , but for most modern speakers they are always pronounced with first tone. (All of these numbers, and 不 , were historically Ru tones, and as noted above, that tone does not have predictable reflexes in modern Chinese; this may account for the variation in tone on these words.)[36]

Stress, rhythm and intonation[edit]

Stress within words (word stress) is not felt strongly by Chinese speakers, although contrastive stress is perceived easily (and functions much the same as in other languages). One of the reasons for the weaker perception of stress in Chinese may be that variations in the fundamental frequency of speech, which in many other languages serve as a cue for stress, are used in Chinese primarily to realize the tones. Nonetheless, there is still a link between stress and pitch – the range of pitch variation (for a given tone) has been observed to be greater on syllables that carry more stress.[37]

As discussed above, weak syllables have neutral tone and are unstressed. Although this property can be contrastive, the contrast is interpreted by some as being primarily one of tone rather than stress. (Some linguists analyze Chinese as lacking word stress entirely.)[38]

Apart from this contrast between full and weak syllables, some linguists have also identified differences in levels of stress among full syllables. In some descriptions, a multi-syllable word or compound[39] is said to have the strongest stress on the final syllable, and the next strongest generally on the first syllable. Others, however, reject this analysis, noting that the apparent final-syllable stress can be ascribed purely to natural lengthening of the final syllable of a phrase, and disappears when a word is pronounced within a sentence rather than in isolation. San Duanmu[40] takes this view, and concludes that it is the first syllable that is most strongly stressed. He also notes a tendency for Chinese to produce trocheesfeet consisting of a stressed syllable followed by one (or in this case sometimes more) unstressed syllables. On this view, if the effect of "final-lengthening" is factored out:

  • In words (compounds) of two syllables, the first syllable has the main stress, and the second lacks stress.
  • In words (compounds) of three syllables, the first syllable is stressed most strongly, the second lacks stress, and the third may lack stress or have secondary stress.
  • In words (compounds) of four syllables, the first syllable is stressed most strongly, the second lacks stress, and the third or fourth may lack stress or have secondary stress depending on the syntactic structure of the compound.

The positions described here as lacking stress are the positions in which weak (neutral-tone) syllables may occur, although full syllables frequently occur in these positions also.

This preference for a trochaic metrical structure is also cited as a reason for certain phenomena of word order variation within complex compounds, and for the strong tendency to use disyllabic words rather than monosyllables in certain positions.[41] Many Chinese monosyllables have alternative disyllabic forms with virtually identical meaning – see Chinese grammar → Word formation.

Another function of voice pitch is to carry intonation. Chinese makes frequent use of particles to express certain meanings such as doubt, query, command, etc., reducing the need to use intonation. However, intonation is still present in Chinese (expressing meanings rather similarly as in standard English), although there are varying analyses of how it interacts with the lexical tones. Some linguists describe an additional intonation rise or fall at the end of the last syllable of an utterance, while others have found that the pitch of the entire utterance is raised or lowered according to the desired intonational meaning.[42]


  1. ^ San Duanmu (2000), The Phonology of Standard Chinese, Oxford University Press, p. 27.
  2. ^ Duanmu (2000), p. 72.
  3. ^ Duanmu (2000), p. 27.
  4. ^ See Ladefoged & Wu 1984; Ladefoged & Maddieson 1996, pp. 150-154.
  5. ^ Duanmu (2000), p. 26.
  6. ^ Duanmu (2000), p. 33.
  7. ^ Norman, Jerry (1988). Chinese. Cambridge University Press. pp. 140–141. ISBN 978-0-521-29653-3. 
  8. ^ Duanmu (2000), p. 25.
  9. ^ Duanmu (2000), p. 274 ff.
  10. ^ Duanmu (2000), p. 28.
  11. ^ Duanmu (2000), p. 43.
  12. ^ Duanmu (2000), p. 195.
  13. ^ Duanmu (2000), p. 41.
  14. ^ Compare the normal treatment in English phonology of "hmm", "unh-unh", "shhh!" and other exclamations that violate usual phonotactic and allophonic rules.
  15. ^ A review of various descriptions of the values of the vowels can be found in Duanmu (2000), p. 37 ff.
  16. ^ Duanmnu (2000), p. 42.
  17. ^ Duanmnu (2000), p. 36.
  18. ^ Duanmu (2000), p. 37.
  19. ^ Hashimoto, Mantaro (1970). "Notes on Mandarin Phonology". In Jakobson, Roman; Kawamoto, Shigeo. Studies in General and Oriental Linguistics. Tokyo: TEC. pp. 207–220. ISBN 978-0-404-20311-5. 
  20. ^ Duanmu (2000), p. 88.
  21. ^ Yip, Po-ching (2000). The Chinese lexicon: a comprehensive survey. Psychology Press. p. 29. ISBN 978-0-415-15174-0. 
  22. ^ Duanmu (2000), p. 88.
  23. ^ Duanmu (2000), p. 258.
  24. ^ Surendran, Dinoj and Levow, Gina-Anne (2004), "The functional load of tone in Mandarin is as high as that of vowels", Proceedings of the International Conference on Speech Prosody 2004, Nara, Japan, pp. 99–102.
  25. ^ "上聲 - 教育部重編國語辭典修訂本". 中華民國教育部. 1994. Retrieved 2010-05-15. 
  26. ^ 《古代汉语词典》编写组 (2002). 古代汉语大词典大字本. 北京: 商务印书馆. p. 1369. ISBN 978-7-100-03515-6. 
  27. ^ Yuen-Ren Chao (1968), A Grammar of Spoken Chinese, p. 27.
  28. ^ Duanmu (2000), p. 222.
  29. ^ Duanmu (2000), p. 213.
  30. ^ Chao (1968), p. 28.
  31. ^ Wang Jialing, The Neutral Tone in Trisyllabic Sequences in Chinese Dialects, Tianjin Normal University, 2004
  32. ^ a b The second notation given, which may require additional font support to display properly, uses modified Chao tone letters composed of staves plus dots.
  33. ^ Yiya Chen and Yi Xu, Pitch Target of Mandarin Neutral Tone (abstract), presented at the 8th Conference on Laboratory Phonology
  34. ^ Duanmu (2000), p. 237.
  35. ^ Duanmu (2000), p. 248.
  36. ^ Duanmu (2000), p. 228.
  37. ^ Duanmu (2000), p. 134, p. 231.
  38. ^ Duanmu (2000), p. 134.
  39. ^ The concepts of "word" and "compound" in Chinese are not easily defined.
  40. ^ Duanmu (2000), p. 136 ff.
  41. ^ Duanmu (2000), pp. 145–194.
  42. ^ Duanmu (2000), p. 234.

Further reading[edit]

  • Lee, Wai-Sum; Zee, Eric (2003), "Standard Chinese (Beijing)", Journal of the International Phonetic Association 33 (1): 109–112, doi:10.1017/S0025100303001208 

External links[edit]