Standard Chinese phonology

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This article summarizes the phonology (the sound system, or in more general terms, the pronunciation) of Standard Chinese (Standard Mandarin).

Standard Chinese is based on the Beijing dialect of Mandarin. Actual production varies widely among speakers, as they inadvertently introduce elements of their native dialects (although television and radio announcers are chosen for their pronunciation accuracy and standard accent). Elements of the sound system include not only the segments – the vowels and consonants of the language – but also the tones that are applied to each syllable. Standard Chinese has four main tones, in addition to a neutral tone used on weak syllables.

This article represents phonetic values using the International Phonetic Alphabet (IPA), noting correspondences chiefly with the pinyin system for transcription of Chinese text. For correspondences with other systems, see the relevant articles, such as Wade–Giles, bopomofo (zhuyin), Gwoyeu Romatzyh, etc., and Romanization of Chinese.


The following table shows the consonant sounds of Standard Chinese, transcribed using the International Phonetic Alphabet (IPA). The sounds shown in parentheses are frequently not analyzed as separate phonemes; for more on these, see Palatal series and Glides, below. Excluding these, there are 19 consonant phonemes in the inventory.

Consonant phonemes of Standard Chinese
Labial Denti-
Retroflex Alveolo-
Nasal m n ŋ
Stop p   t   k  
Affricate  t͡s  t͡sʰ  ʈ͡ʂ  ʈ͡ʂʰ    (t͡ɕ)  (t͡ɕʰ)
Fricative f s ʂ (ɕ) x
Approximant l ɻ (j)  (ɥ) (w)

Between pairs of consonants having the same place and manner of articulation, the primary distinction is not voiced vs. voiceless (as in English), but unaspirated vs. aspirated. The unaspirated stops and affricates may however be voiced in unstressed syllables.[1] Such pairs are represented in the pinyin system mostly using letters which (in European languages) principally denote voiceless/voiced pairs, with the "voiceless" letter representing an aspirated sound, and the "voiced" letter an unaspirated sound. Thus pinyin p and b represent respectively /pʰ/ and /p/ (aspirated and unaspirated "p" sounds), t and d represent /tʰ/ and /t/, k and g represent /kʰ/ and /k/, c and z represent /t͡sʰ/ and /t͡s/, ch and zh represent /ʈ͡ʂʰ/ and /ʈ͡ʂ/ (similar to the English "ch" sound, respectively with and without aspiration), and q and j represent /t͡ɕʰ/ and /t͡ɕ/ (a palatal variant of the "ch" sound, again with and without aspiration).

The phonemes /m/, /n/, /ŋ/, /f/, /s/ and /l/ are written m, n, ng, f, s and l in pinyin, being pronounced as these letters would be in English. /ʂ/ and [ɕ], respectively retroflex and palatal variants of the English "sh" sound, are written sh and x in pinyin. /x/ (the final sound of the Scottish "loch", although it is also sometimes pronounced more like [h], the English "h" sound[2]) is written h in pinyin. The glides [j], [ɥ] and [w] often appear as y, yu and w in pinyin, but see further under Glides below.

The phoneme /ɻ/, in syllable-initial position, is pronounced by some speakers as [ɻ] (a retroflex approximant, similar to initial "r" in some varieties of English), and by some as [ʐ] (a voiced retroflex fricative, closer to the middle consonant sound of "vision"). It is written r in pinyin. For its occurrence in syllable-final position, see Rhotic coda, below.

For the retroflex consonants, the constriction is produced with the upper surface of the tongue tip, making them laminal rather than apical post-alveolars.[3] Speakers not from Beijing often lack retroflex consonants in their native dialects, and may thus replace them with dentals.[4]

All of the consonants may occur as the initial sound of a syllable, with the exception of /ŋ/ (unless the zero initial is assigned to this phoneme; see below.) The glides [j], [ɥ], [w] may also be medials (coming between the initial consonant and the main vowel). The only consonants that can appear in syllable coda (final) position are /n/, /ŋ/, and /ɻ/ (although [m] may occur as an allophone of /n/ before labial consonants in fast speech). Final /n/, /ŋ/ may be pronounced without complete oral closure, resulting in a syllable that in fact ends with a long nasalized vowel.[5]

Palatal series[edit]

The alveolo-palatal consonants  [t͡ɕ, t͡ɕʰ, ɕ] arose historically from a merger of the alveolar consonants  [t͡s, t͡sʰ, s] and the velar consonants [k kʰ x] before high front vowels and glides. The resulting palatals are in complementary distribution with the other two series, and with the retroflex consonants  [ʈ͡ʂ, ʈ͡ʂʰ, ʂ], none of which now occur in a high front environment. Some linguists prefer to classify  [t͡ɕ, t͡ɕʰ, ɕ] as allophones of one of the other three series.[6] The Yale and Wade–Giles systems mostly treat the palatals as allophones of the retroflex consonants; Tongyong Pinyin mostly treats them as allophones of the alveolars; and Chinese braille treats them as allophones of the velars. In Hanyu Pinyin and bopomofo, however, they are represented as a separate sequence.

The collapse of the velar and alveolar sibilant series into the alveolo-palatal in palatalizing environments happened only a few centuries ago. Before then, some instances of modern [t͡ɕ(ʰ)i] were instead [k(ʰ)i], and others were [t͡s(ʰ)i] . The change took place in the last two or three centuries at different times in different areas, but not in the dialect used in the Manchu dynasty imperial court. This explains why some European transcriptions of Chinese names (especially in the postal map spelling) contain "ki-", "hi-", "tsi-" or "si-". Examples are "Peking" for Beijing; "Chungking" for Chongqing; "Fukien" for Fujian (a province); "Tientsin" for Tianjin; "Sinkiang" for Xinjiang; "Sian" for Xi'an. The complementary distribution with the retroflex series arose when syllables that had a retroflex consonant followed by a medial glide lost the medial glide.


The glides [j], [ɥ] and [w] sound respectively like the "y" in English "yes", the "(h)u" in French "huit", and the "w" in English "we". (Beijing speakers often replace initial [w] with a labiodental [ʋ], except when it is followed by [o].[7]) The glides are commonly analyzed not as independent phonemes, but as consonantal allophones of the high vowels /i/, /y/ and /u/. This is possible since there is no ambiguity in interpreting a sequence like [jɑʊ̯] (pinyin yao/-iao) as /iau/, and potentially problematic sequences such as */iu/ do not occur.

The glides may occur in initial position in a syllable. This occurs with [ɥ] in the syllables written yu, yuan, yue, yun and yong in pinyin; with [j] in other syllables written with initial y in pinyin (ya, yi, etc.); and with [w] in syllables written with initial w in pinyin (wa, wu, etc.). When a glide is followed by the vowel of which that glide is considered an allophone, the glide may be regarded as epenthetic (automatically inserted), and not as a separate realization of the phoneme. Hence the syllable yi, pronounced [ji], may be analyzed as consisting of the single phoneme /i/, and similarly yin may be analyzed as /in/, yu as /y/, and wu as /u/.[8]

The glides can also occur in medial position, that is, after the initial consonant but before the main vowel. Here they are represented in pinyin as vowels: for example, the i in bie represents [j], and the u in duan represents [w]. There are some restrictions on the possible consonant-glide combinations: [w] does not occur after labials (except for some speakers in bo, po, mo, fo); [j] does not occur after retroflexes and velars (or after [f]); while [ɥ] occurs medially only in lüe and nüe (and in the cases noted below). A consonant-glide combination at the start of a syllable is articulated as a single sound – the glide is not in fact pronounced after the consonant, but is realized as palatalization [ʲ], labialization [ʷ], or both [ɥ], of the consonant.[9] (The same modifications of initial consonants occur in syllables where they are followed by a high vowel, although apparently no glide is present. Hence a consonant is generally palatalized [ʲ] when followed by /i/, labialized [ʷ] when followed by /u/, and both [ɥ] when followed by /y/.)

The syllables represented in pinyin as beginning with a palatal and a glide (ji-, qi-, xi-, ju-, qu-, xu-, followed by a vowel) may be pronounced either with a palatal (labialized in the -u- cases), or with a palatalized dental (additionally labialized in the -u- cases). Thus the six respective onsets may be either [t͡ɕ], [t͡ɕʰ], [ɕ], [t͡ɕʷ], [t͡ɕʰʷ], [ɕʷ], or [t͡sʲ], [t͡sʰʲ], [sʲ], [t͡sɥ], [t͡sʰɥ], [sɥ]. The second pronunciation type is especially common among children and females.[10]

Non-syllabic forms of the vowels /i/ and /u/, which are sometimes notated as glides, are also found as the final element in some syllables, i.e. as the second element of a diphthong. These are discussed below under Vowels.

Zero onset[edit]

A full syllable such as ai, which begins with a pure vowel (not a consonant or a glide), is said to have a null initial or zero onset. This may be realized as a consonant sound: [ɣ], [ʔ], [ŋ] and [ɦ] are possibilities, and it has been suggested that such an onset be regarded as a special phoneme, or as an instance of the phoneme /ŋ/, although it can also be treated as no phoneme (absence of onset). By contrast, in the case of the particle 啊 a, which is a weak (not full) onsetless syllable, linking occurs with the previous syllable (so the particle may sound like [na], [ŋa], [ja] or [wa], depending on the preceding coda).[11]

Rhotic coda[edit]

Main article: Erhua

Standard Chinese features syllables that end with a rhotic coda ("r"). This feature, known in Chinese as erhua, is particularly characteristic of the Beijing dialect; many other dialects do not use it as much, and some not at all.[12] It occurs in two cases:

  1. In a small number of independent words or morphemes pronounced [əɻ ] or [ɑɻ ], written in pinyin as er (with some tone), such as 二 èr "two", 耳 ěr "ear", and 儿 (traditional 兒) ér "son".
  2. In syllables in which the rhotic coda is added as a suffix to another morpheme. This suffix is represented by the character 儿 [兒] ("son"), to which meaning it is historically related, and in pinyin as r. The suffix combines with the final sound of the syllable, and regular but complex sound changes occur as a result (described in detail under erhua).

The "r" final can be analyzed as representing the same phoneme as the initial [ɻ~ʐ] (which is also written r in pinyin). However, the final sound is pronounced with a relatively lax tongue, and has been described as a "retroflex vowel".[13]

In dialects that do not make use of the rhotic coda, it may be omitted in pronunciation, or in some cases a different word may be selected: for example, Beijing 这儿 zhèr "here" and 那儿 nàr "there" may be replaced by the synonyms 这里 zhèli and 那里 nàli.

Syllabic consonants[edit]

The syllables written in pinyin as zi, ci, si, zhi, chi, shi, ri may be described as having a syllabic consonant in place of a vowel (syllabic [z] in the first three cases; syllabic [ɻ] in the others). For more analysis see under Vowels, below.

More marginally, Chinese also has syllabic nasal consonants in certain interjections; pronunciations of such words include [m], [n], [ŋ], [hm], [hŋ].


Standard Chinese can be analyzed as having five or six vowel phonemes: /a/, /ǝ/, /i/, /u/, /y/, and according to some analyses also /ɨ/. (For discussion of possible analyses, including some with even smaller numbers of vowels, see below.) The vowel /a/ is a low (open) vowel, /ǝ/ is a mid vowel, and /i/, /u/ and /y/ are high (close) vowels. The precise realization of each vowel depends on its phonetic environment.

The vowel /ə/ has two broad allophones [e] and [o] (corresponding respectively to pinyin e and o in most cases). These sounds can be treated as a single underlying phoneme because they are in complementary distribution. (Apparent counterexamples are provided by certain interjections, such as [ɔ], [ɛ], [jɔ], and [lɔ], but these are normally treated as special cases operating outside the normal phonemic system.[14])

Many Chinese syllables contain diphthongs. These are commonly analyzed as two-phoneme sequences, the second phoneme being either /i/ or /u/. For example, the syllable bai, pronounced [paɪ̯], is assigned the underlying representation /pai/. (In pinyin, the second element is generally written i or u, but /au/ is written ao.)

The following list indicates the environments in which the vowels occur (noting the corresponding pinyin spellings), and provides a possible narrow transcription of their allophones. However, there is much variation between sources as regards the description and distribution of these allophones.

  1. For /a/ (pinyin a):
    • [a], an open front vowel, in (-)ai [aɪ̯] and and wai/-uai [waɪ̯] (these are close to rhyming with English "pie")
    • [ä], an open central vowel, in (-)a [ä], ya/-ia [jä], wa/-ua [wä], and (except as noted below) in an [än] and wan/-uan [wän]
    • [ɑ], an open back vowel, in -ang [ɑŋ], yang/-iang [jɑŋ], wang/-uang [wɑŋ], (-)ao, [ɑʊ̯], yao/-iao [jɑʊ̯]
    • [ɛ], an open-mid front vowel, more like English e in "pet", in yan/-ian [jɛn], and in yuan and in -uan after palatals [ɥɛn]. These sequences are normally analyzed as having underlying /a/ rather than /ǝ/, because they become [ jɐɻ ], [ɥɐɻ ] when a rhotic coda is added
  2. For /ǝ/ (mostly pinyin e and o):
    • environments with the broad allophone [e] (mostly pinyin e):
      • [e], a close-mid front vowel, in -ei [eɪ̯] and wei/-ui [weɪ̯] (these are close to rhyming with English "bay")
      • [ɛ], open-mid front (see above), in ye/-ie [jɛ]
      • [œ], as previous but with lip rounding, in yue/-üe and in -ue after palatals [ɥœ]
    • environments with the broad allophone [o] (mostly pinyin o):
      • [ɤ], without lip rounding, in (-)ou [ɤʊ̯] and you/-iu [jɤʊ̯]
      • [ɔ], like in English "port", in wo/-uo and in bo, po, mo, fo [wɔ]
    • environments neutral between [e] and [o]:
      • [ɯ̯ʌ], or [ə] in reduced syllables, in (-)e
      • [ɤ] (see also above), in -eng [ɤŋ], weng [wɤŋ]
      • [ə], in -en [ən], wen/-un [wən]
  3. For /i/ (generally pinyin i):
    • [i], a high front vowel, in yi/-i [(j)i] (except as noted below under /ɨ/), yin/-in [(j)in], ying/-ing [(j)iŋ]
    • [j], a glide like English "y" in "yes", as a syllable initial (y-) or medial (-i-) (see Glides, above)
    • [ɪ̯], a non-syllabic vowel like English "y" in "hey", as a syllable coda (-i) (see the note on diphthongs, above)
  4. For /y/ (pinyin ü, sometimes u):
    • [y] (like French u or German ü), in yu/-ü [(ɥ)y], yun/-ün [(ɥ)yn], also written -u or -un after the palatals j, q, x
    • [ɥ], a glide like French "(h)u" in "huit", as a syllable initial or medial in the cases noted under Glides, above
  5. For /u/ (generally pinyin u, or o before ng):
    • [u], a high back rounded vowel, in wu/-u [(w)u] (except after palatals)
    • [ʊ], like in English "put", in -ong [ʊŋ], yong/-iong [jʊŋ]
    • [w], a glide like English "w" in "we", as a syllable initial (w-) or medial (-u-) (see Glides, above)
    • [ʊ̯], a non-syllabic vowel like English "w" in "low", as a syllable coda (-u, -o) (see the note on diphthongs, above)
  6. For /ɨ/ (if considered to be a phoneme):
    • [ɯ], like [u] without lip rounding, in zi, ci, si (see further below)
    • [ɨ], similar to Russian ы, in zhi, chi, shi, ri (see further below)

As a general rule, vowels in open syllables (those which have no coda following the main vowel) are pronounced long, while others are pronounced short. This does not apply to weak syllables, in which all vowels are short.[15]

It is possible to merge the phonemes /ɨ/ and /i/, which are historically related (and are both represented by i in pinyin), since they are in complementary distribution, provided that the alveolo-palatal series is either left unmerged, or merged with the velars rather than the retroflex or alveolar series. (That is, [t͡ɕi], [t͡sɯ] and [ʈ͡ʂɨ] all exist, but there is neither *[ki] nor *[kɨ], so there is no problem merging both [i]~[ɨ] and [k]~[t͡ɕ] at the same time.)

Another approach is to regard the syllables assigned above to /ɨ/ as having (underlyingly) an empty nuclear slot ("empty rime"), i.e. as not containing a vowel phoneme at all. This is linked to an alternative description of the surface form of these syllables, wherein their nucleus is not a vowel but a syllabic consonant: a syllabic [z] in the syllables zi, ci, si, and a syllabic [ɻ] in zhi, chi, shi, ri.[16]

If all the mergers considered above are accepted, the result is a system with 19 consonant phonemes and 5 vowel phonemes.

Some linguists prefer to reduce the number of vowel phonemes still further (at the expense of including underlying glides in their systems). Edwin Pulleyblank has proposed a system which includes underlying glides, but no vowels at all.[17] More common are systems with two vowels; for example, in Mantaro Hashimoto's system,[18] the vowels [i], [u] and [y] are analyzed as surface forms of the glides /j/, /w/, /ɥ/ combined with a null meta-phoneme Ø. In this system, shown below, there are just two vowel nuclei, /a/ and /ə/; various allophones result from a preceding glide /j, w, ɥ/ (or null) and a coda /i~j, u~w, n, ŋ/ (or null; see erhua for the additional sequences afforded by the rhotic coda /ɻ /). (The minimal vowel /ɨ/ is ascribed to the surface manifestation of all three values being null, e.g. [sɨ] would be analyzed as an underlying syllabic /s/.)

Nucleus Coda Medial
Ø j w ɥ
a Ø ä
i aɪ̯ waɪ̯
u ɑʊ̯ jɑʊ̯
n än jɛn wän ɥɛn
ŋ ɑŋ jɑŋ wɑŋ
ə Ø ɯ̯ʌ i̯ɛ u̯ɔ y̯œ
i eɪ̯ weɪ̯
u ɤʊ̯ jɤʊ̯
n ən in wən yn
ŋ ɤŋ ʊŋ
~ wɤŋ

(after zero onset)
Ø ɨ~ɯ i u y


Syllables in Standard Chinese have the maximal form CGVXT, where C is the initial consonant; G is one of the glides [j], [w], [ɥ]; V is a vowel; X is a coda which may be one of [n], [ŋ], [ɻ], [i̯], [u̯]; and T is the tone. Any of C, G and X (and V, in some analyses) may be absent. The first C is called the "initial", G the "medial", and VXT the "final" or "rime"; sometimes the medial is considered part of the rime.

Many of the possible combinations under the above scheme do not actually occur. There are only some 35 final combinations (medial+rime) in actual syllables (see pinyin finals). In all, there are only about 400 different syllables when tone is ignored, and about 1300 when tone is included. This is a far smaller number of distinct syllables than in a language such as English. Since Chinese syllables usually constitute whole words, or at least morphemes, the smallness of the syllable inventory results in large numbers of homophones.

For a list of all Standard Chinese syllables (excluding tone and rhotic coda) see the pinyin table or zhuyin table.

Syllables are also classified as full or weak. Weak syllables (usually grammatical markers such as 了 le, or the second syllables of some compound words) are unstressed and have neutral tone. Full syllables can be analyzed as having two morae ("heavy"), the vowel being lengthened if there is no coda. Weak syllables have a single mora ("light"), being pronounced approximately 50% shorter than full syllables. Weak syllables with a coda are subject to "rime reduction" – the coda is dropped, and the vowel may become more central (and be nasalized, if the coda was a nasal consonant).[19] For more details see Syllable reduction, below.

Syllable reduction[edit]

When a syllable is unstressed, it not only loses its tone, but tenuis occlusives such as b d g z j become voiced (in pinyin, bb dd gg zz jj) and the vowel is reduced. When the consonant of the unstressed syllable in a nasal or a fricative, the vowel (or entire rime) may be dropped altogether. For example,[20]

Full form Reduced form
zuǐba 'mouth' zuǐbbə
ěrduo 'ear' ěrddo
xǐhuan 'to like' xǐhuə
chūqu 'to go out' chūqə
bízi 'nose' bízz
dōngxi 'thing' dōngx
dòufu 'tofu' dòuf
wǒmen 'us' wǒm
shénme 'what' shém

The last example involved assimilation as well, which is seen even in unreduced syllables in quick speech (for example, in guǎmbō for guǎngbō 'broadcast'). The most salient example of assimilation is the exclamatory particle ā, which even has different characters for its assimilated forms:

Preceding sound Assimilated form
-ng, -ɨ[21] ā 啊
-a, -o, -e, -i, -ü yā 呀 (from ŋā)
-u wā 哇
-le lā 啦
-n nā 哪


Relative pitch changes of the four tones

Standard Chinese, like all Chinese dialects, is a tonal language. This means that in addition to consonants and vowels, the pitch contour of a syllable is used to distinguish words from each other. Many non-native Chinese speakers have difficulties mastering the tones of each character, but correct tonal pronunciation is essential for intelligibility because of the vast number of words in the language that only differ by tone (i.e. are minimal pairs with respect to tone). Statistically, tones are as important as vowels in Standard Chinese.[22] The following are the four main tones of Standard Chinese (excluding the neutral, or fifth, tone, which is discussed in the following section):

Tone chart of Standard Chinese
Tone name Yin Ping Yang Ping Shang Qu
Tone number 1 2 3 4
Pinyin diacritic ā á ǎ à
Tone letter ˥˥ (55) ˧˥ (35) ˨˩, ˨˩˦ (21, 214) ˥˩ (51)
IPA diacritic á ǎ à, a᷉ â

The four main tones of Standard Mandarin pronounced with the syllable 'ma':
  1. First tone, or high-level tone (阴平 [陰平] yīnpíng, literal meaning: "dark level"):
    is a steady high sound, as if it were being sung instead of spoken.
  2. Second tone, or rising tone, or more specifically high-rising (阳平 [陽平] yángpíng, literal meaning: "light level"):
    is a sound that rises from mid-level tone to high (like in the English "What?!").
  3. Third tone, low or dipping tone (上 shǎng,[23][24] literal meaning: "rising"):
    has a mid-low to low descent; if at the end of a sentence or before a pause, it is then followed by a rising pitch. Between other tones it may simply be low. See tone sandhi below.
  4. Fourth tone, falling tone, or high-falling (去 , literal meaning: "departing"):
    features a sharp fall from high to low, and is a shorter tone, similar to curt commands (like in the English "Stop!").

Most romanizations represent the tones as diacritics on the vowels (e.g., Hanyu Pinyin, Mandarin Phonetic Symbols II and Tongyong Pinyin). Zhuyin uses diacritics as well. Others, like Wade-Giles, use superscript numbers at the end of each syllable. The tone marks and numbers are rarely used outside of language textbooks: in particular, they are usually absent in public signs, company logos, and so forth. Gwoyeu Romatzyh is a rare example where tones are not represented as special symbols, but using normal letters of the alphabet (although without a one-to-one correspondence).

Neutral tone[edit]

Also called fifth tone or zeroth tone (in Chinese 轻声 [輕聲] qīng shēng, literal meaning: "light tone"), neutral tone is sometimes thought of as a lack of tone. It usually comes at the end of a word or phrase, and is pronounced in a light and short manner. Because of this characteristic, and because there is no standard rule for whether a syllable has a neutral tone, it is considered analogous to an unstressed syllable. The neutral tone has a large number of allophones: its pitch depends almost entirely on the tone of the preceding syllable. The situation is further complicated by the amount of dialectal variation associated with it; in some regions, notably Taiwan, the neutral tone is relatively uncommon.

Despite many examples of minimal pairs (for example, 要是 and 钥匙, yàoshì if and yàoshi key, respectively), it is sometimes described as something other than a full-fledged tone for technical reasons: some linguists feel that it results from a "spreading out" of the tone on the preceding syllable. This idea is appealing intuitively because without it, the neutral tone needs relatively complex tone sandhi rules to be made sense of; indeed, it would have to have 4 allotones, one for each of the four tones that could precede it. However, the "spreading" theory incompletely characterizes the neutral tone, especially in sequences where more than one neutrally toned syllable are found adjacent.[25]

The following are from Beijing dialect.[26] Other dialects may be slightly different.

Realization of neutral tones
Tone of first syllable Pitch of neutral tone Example Pinyin English meaning
1 ˥ ˨ (2) 玻璃 (˥.˨) bōli glass
2 ˧˥ ˧ (3) 伯伯 (˧˥.˧) bóbo uncle
3 ˨˩ ˦ (4) 喇叭 (˨˩.˦) lǎba horn
4 ˥˩ ˩ (1) 兔子 (˥˩.˩) tùzi rabbit

Tone sandhi[edit]

Pronunciation also varies with context according to the rules of tone sandhi. The most prominent phenomenon of this kind is when there are two third tones in immediate sequence, in which case the first of them changes to a rising tone, the second tone. In the literature, this contour is often called two-thirds tone or half-third tone, though generally, in Standard Chinese, the "two-thirds tone" is the same as the second tone. If there are three third tones in series, the tone sandhi rules become more complex, and depend on word boundaries, stress, and dialectal variations.

Basic rules of tone sandhi[edit]

  1. When there are two 3rd tones (˨˩˦) in a row, the first syllable becomes 2nd tone (˧˥), and the second syllable becomes a half-3rd tone (˨˩). The half-3rd tone is a tone that only falls but does not rise.
    ex: 老鼠 (lǎoshǔ) becomes [lɑʊ̯˧˥ʂu˨˩]
  2. When there are three 3rd tones in a row, things get more complicated.
    If the first word is two syllables, and the second word is one syllable, the first two syllables become 2nd tones, and the last syllable stays 3rd tone:
    ex: 保管 (bǎoguǎn hǎo) becomes [pɑʊ̯˧˥ku̯an˧˥xɑʊ̯˨˩˦]
    If the first word is one syllable, and the second word is two syllables, the first syllable becomes half-3rd tone (˨˩), the second syllable becomes 2nd tone, and the last syllable stays 3rd tone:
    ex: 保管 (lǎo bǎoguǎn) becomes [lɑʊ̯˨˩pɑʊ̯˧˥ku̯an˨˩˦]
  3. When a 3rd tone is followed by a first, second or fourth tone, or most neutral tone syllables, it usually becomes a half-3rd tone.
    ex: 美妙 (měimiào) becomes [mei̯˨˩mi̯ɑʊ̯˥˩]

Rules for "一" and "不"[edit]

"" (yī) and "" (bù) have special rules which do not apply to other Chinese characters:

  1. When in front of a 4th tone syllable, "" becomes 2nd tone.
    ex: 一定 (yīdìng becomes yídìng [i˧˥tiŋ˥˩])
  2. When in front of a non-4th tone syllable, "" becomes 4th tone.
    ex. (1st tone):一天 (yītiān → yìtiān [i˥˩tʰi̯ɛn˥])
    ex. (2nd tone): 一年 (yīnián → yìnián [i˥˩ni̯ɛn˧˥])
    ex. (3rd tone): 一起 (yīqǐ → yìqǐ [i˥˩t͡ɕʰi˨˩˦])
  3. When "" falls between two words, it becomes neutral tone.
    ex: 看一看 (kànyīkàn) becomes kànyikàn
  4. When counting sequentially, and for all other situations "" retains its root tone value of 1st tone. This includes when 一 is used at the end of a multi-syllable word (regardless of the first tone of the next word), and when 一 is immediately followed by any digit, including another 一; hence 一 also retains its root tone value of 1st tone in both syllables of the word "一一". For instance, 一一对应 is pronounced as yīyīduìyìng.
  5. When 一 is part of a cardinal number, it is pronounced as 4th tone when before or , but in an ordinal number it is pronounced as 1st tone in these contexts.
  6. "" becomes 2nd tone only when followed by a 4th tone syllable.
    ex: 不是 (bùshì) becomes búshì [pu˧˥ʂɨ˥˩]
  7. When "" comes between two words in a yes-no question, it loses its tone (becomes neutral in tone).[citation needed]
    ex: 是不是 (shìbùshì) becomes shìbushì[citation needed]

Relationship between Middle Chinese and modern tones[edit]

Relationship between Middle Chinese and modern tones:

V- = unvoiced initial consonant
L = sonorant initial consonant
V+ = voiced initial consonant (not sonorant)

Middle Chinese Tone Ping (平) Shang (上) Qu (去) Ru (入)
Initial V- L V+ V- L V+ V- L V+ V- L V+
Standard Chinese Tone name Yin Ping
(陰平, 1)
Yang Ping
(陽平, 2)
(上, 3)
(去, 4)
with no pattern
to Qu to Yang Ping
Tone contour 55 35 214 51 to 51 to 35

Stress and rhythm[edit]

Stress within words (word stress) is not felt strongly by Chinese speakers (this may be because variations in the fundamental frequency of speech, which in many other languages serve as a cue for stress, are used in Chinese primarily to realize the tones). However, contrastive stress (where the cue may be the range of pitch variation) is perceived easily, and functions as in other languages.[27]

As discussed above, weak syllables have neutral tone and are unstressed. This property can be contrastive, e.g. 大意 dà yì means "main idea", but pronounced dàyi, with a weak second syllable, it means "careless". However, this contrast is interpreted by some as being primarily one of tone rather than stress. (Some linguists analyze Chinese as lacking word stress entirely.)[28]

Apart from this contrast between full and weak syllables, some linguists have also identified differences in levels of stress among full syllables. In some descriptions, a multi-syllable word or compound[29] is said to have the strongest stress on the final syllable, and the next strongest generally on the first syllable. Others, however, reject this analysis, noting that the apparent final-syllable stress can be ascribed purely to natural lengthening of the final syllable of a phrase, and disappears when a word is pronounced within a sentence rather than in isolation. San Duanmu[30] takes this view, and concludes that it is the first syllable that is most strongly stressed. He also notes a tendency for Chinese to produce trocheesfeet consisting of a stressed syllable followed by one (or in this case sometimes more) unstressed syllables. On this view, if the effect of "final-lengthening" is factored out:

  • In words (compounds) of two syllables, the first syllable has the main stress, and the second lacks stress.
  • In words (compounds) of three syllables, the first syllable is stressed most strongly, the second lacks stress, and the third may lack stress or have secondary stress.
  • In words (compounds) of four syllables, the first syllable is stressed most strongly, the second lacks stress, and the third or fourth may have secondary stress depending on the syntactic structure of the compound.

The positions described here as lacking stress are the positions in which weak (neutral-tone) syllables may occur, although full syllables frequently occur in these positions also.

This preference for a trochaic metrical structure is also cited as a reason for certain phenomena of word order variation within complex compounds, and for the strong tendency to use disyllabic words rather than monosyllables in certain positions.[31] Many Chinese monosyllables have alternative disyllabic forms with virtually identical meaning – see Chinese grammar → Word formation.


  1. ^ San Duanmu (2000), The Phonology of Standard Chinese, Oxford University Press, p. 27.
  2. ^ Duanmu (2000), p. 27.
  3. ^ See Ladefoged & Wu 1984; Ladefoged & Maddieson 1996, pp. 150-154.
  4. ^ Duanmu (2000), p. 26.
  5. ^ Duanmu (2000), p. 72.
  6. ^ Norman, Jerry (1988). Chinese. Cambridge University Press. pp. 140–141. ISBN 978-0-521-29653-3. 
  7. ^ Duanmu (2000), p. 25.
  8. ^ Duanmu (2000), p. 274 ff.
  9. ^ Duanmu (2000), p. 28.
  10. ^ Duanmu (2000), p. 33.
  11. ^ Duanmu (2000), p. 43.
  12. ^ Duanmu (2000), p. 195.
  13. ^ Duanmu (2000), p. 41.
  14. ^ Compare the normal treatment in English phonology of "hmm", "unh-unh", "shhh!" and other exclamations that violate usual phonotactic and allophonic rules.
  15. ^ Duanmnu (2000), p. 42.
  16. ^ Duanmnu (2000), p. 36.
  17. ^ Duanmu (2000), p. 37.
  18. ^ Hashimoto, Mantaro (1970). "Notes on Mandarin Phonology". In Jakobson, Roman; Kawamoto, Shigeo. Studies in General and Oriental Linguistics. Tokyo: TEC. pp. 207–220. ISBN 978-0-404-20311-5. 
  19. ^ Duanmu (2000), p. 88.
  20. ^ Yip, Po-ching (2000). The Chinese lexicon: a comprehensive survey. Psychology Press. p. 29. ISBN 978-0-415-15174-0. 
  21. ^ The vowel of si, zi, ci, shi, zhi, chi
  22. ^ Surendran, Dinoj and Levow, Gina-Anne (2004), "The functional load of tone in Mandarin is as high as that of vowels", Proceedings of the International Conference on Speech Prosody 2004, Nara, Japan, pp. 99–102.
  23. ^ "上聲 - 教育部重編國語辭典修訂本". 中華民國教育部. 1994. Retrieved 2010-05-15. 
  24. ^ 《古代汉语词典》编写组 (2002). 古代汉语大词典大字本. 北京: 商务印书馆. p. 1369. ISBN 978-7-100-03515-6. 
  25. ^ Yiya Chen and Yi Xu, Pitch Target of Mandarin Neutral Tone (abstract), presented at the 8th Conference on Laboratory Phonology
  26. ^ Wang Jialing, The Neutral Tone in Trisyllabic Sequences in Chinese Dialects, Tianjin Normal University, 2004
  27. ^ Duanmu (2000), p. 134.
  28. ^ Duanmu (2000), p. 134.
  29. ^ The concepts of "word" and "compound" in Chinese are not easily defined.
  30. ^ Duanmu (2000), p. 136 ff.
  31. ^ Duanmu (2000), pp. 145–194.