Standard Chinese phonology
||This article needs additional citations for verification. (February 2010)|
This article summarizes the phonology of Standard Chinese, also known as Standard Mandarin. Actual production varies widely among speakers, as people inadvertently introduce elements of their native dialects. By contrast, television and radio announcers are chosen for their pronunciation accuracy and standard accent.
The following is the consonant inventory of Standard Chinese, transcribed in the International Phonetic Alphabet (IPA):
|Stop||p pʰ||t tʰ||k kʰ|
|Affricate||t͡s t͡sʰ||ʈ͡ʂ ʈ͡ʂʰ||t͡ɕ t͡ɕʰ²|
- /ɻ/ varies as [ʐ] (a voiced retroflex fricative), depending on the speaker.
- These are not always considered independent phonemes. See below.
- These are commonly viewed not as independent phonemes but as either (1) consonantal allophones of "medial" high vowels (i.e. when another vowel follows); or (2) epenthetic (automatically inserted) glides before "main" high vowels (i.e. not followed by another vowel).
For the retroflex consonants, Standard Chinese "speakers produce the constriction for this sound with the upper surface of the tip of the tongue, making it a laminal rather than an apical post-alveolar." (Ladefoged & Wu 1984; Ladefoged & Maddieson 1996:150-154).
The alveolo-palatal consonants [t͡ɕ, t͡ɕʰ, ɕ] arose historically from a merger of the alveolar consonants [t͡s, t͡sʰ, s] and the velar consonants [k kʰ x] before high front vowels and glides. The resulting palatals are in complementary distribution with the other two series, and with the retroflex consonants [ʈ͡ʂ, ʈ͡ʂʰ, ʂ], none of which now occur in a high front environment. Some linguists prefer to classify [t͡ɕ, t͡ɕʰ, ɕ] as allophones of one of the other three series. The Yale and Wade-Giles systems mostly treat the palatals as allophones of the retroflex consonants; Tongyong Pinyin mostly treats them as allophones of the alveolars; and Chinese braille treats them as allophones of the velars. In Hanyu Pinyin and bopomofo they are considered apart, however.
The collapse of the velar and alveolar sibilant series into the alveolo-palatal in palatalizing environments happened only a few centuries ago. Before then, some instances of modern [t͡ɕ(ʰ)i] were instead [k(ʰ)i], and others were [t͡s(ʰ)i] . The change took place in the last two or three centuries at different times in different areas, but not in the dialect used in the Manchu dynasty imperial court. This explains why some European transcriptions of Chinese names (especially in the postal map spelling) contain "ki-", "hi-", "tsi-" or "si-". Examples are "Peking" for Beijing; "Chungking" for Chongqing; "Fukien" for Fujian (a province); "Tientsin" for Tianjin; "Sinkiang" for Xinjiang; "Sian" for Xi'an. The complementary distribution with the retroflex series appeared as syllables that had a retroflex consonant followed by a medial glide lost the medial glide.
[t͡ɕ, t͡ɕʰ, ɕ] may be pronounced [t͡sj, t͡sʰj, sj], which is characteristic of the speech of young women, and also of some men. This is considered rather effeminate and may also be substandard.
The null initial, written as an apostrophe in pinyin word-medially, is most commonly realized as [ɰ], though [n], [ŋ], [ɣ], and [ʔ] are common in nonstandard Mandarin dialects; some of these correspond to null in Standard Chinese but contrast with it in their dialect.
Corresponding chart in:
Under most analyses, Standard Chinese has five or six vowel phonemes:
- /ə/: sounding more like [e] before and after front glides (/j/, /ɥ/), [o] before and after /w/, and [ə] elsewhere
- /y/ (a front rounded vowel, as in French or German)
- Possibly /ɨ/ (only after alveolar and retroflex sibilants, where it is often pronounced as a syllabic fricative)
The pronunciation of each of these sounds varies somewhat depending on context. In a very narrow transcription, the following phones may be distinguished:
- For /a/:
- [a], in Pinyin an, uan, ai, uai [an], [wan], [aɪ̯], [waɪ̯]
- [ä], in Pinyin a, ia, ua [ä], [jä], [wä] (Depending on whether the sound after it is front or back, some may pronounce it more like [a] or [ɑ], respectively)
- [ɑ], in Pinyin ang, iang, uang, ao, iao [ɑŋ], [jɑŋ], [wɑŋ], [ɑʊ̯], [jɑʊ̯]
- For broad allophone [e] of /ǝ/:
- [e], in Pinyin ei, wei/ui [eɪ̯], [weɪ̯] (Some may pronounce it as a more central [e̽])
- [ɛ], in Pinyin ie, ian [jɛ], [jɛn], and an interjection ê [ɛ]
- [œ̜], in Pinyin yue/üe, yuan/üan [ɥœ̜], [ɥœ̜n]
- For broad allophone [o] of /ǝ/:
- [o], in Pinyin ou, you/iu [oʊ̯], [joʊ̯] (Some may pronounce it more central and unrounded: [ɤ̹])
- [ɔ], in Pinyin uo [wɔ] and an interjection o [ɔ] (Some may pronounce it closer)
- For broad allophone [ǝ] of /ǝ/:
- [ɤ], in Pinyin e, eng, weng [ɤ], [ɤŋ], [wɤŋ] (Some pronounce this as [ɰʌ])
- [ə], in Pinyin en, wen/un [ən], [wən]
- For /i/:
- [i], in Pinyin yi/i, yin/in, ying/ing [i], [in], [iŋ]
- For some speakers, [iə̯] in Pinyin ing [iə̯ŋ] (in place of [iŋ])
- For /y/:
- [y], in Pinyin yu/ü, yun/ün [y], [yn]
- For some speakers, [yɪ̯] in Pinyin yun/ün [yɪ̯n] (in place of [yn])
- For /u/:
- [ʊ], in Pinyin ong, iong, [ʊŋ], [jʊŋ]
- [u], in Pinyin wu/u [u]
- For /ɨ/:
- [ɯ] (Pinyin i), after the alveolar sibilants /t͡s t͡sʰ s/ (zi, ci, si)
- [ɨ] (Pinyin i), after the retroflex sibilants /ʈ͡ʂ ʈ͡ʂʰ ʂ ʐ / (zhi, chi, shi, ri)
Note that Pinyin often has more than one way to spell the same set of sounds. When two variants are given, the first is when the syllable final occurs as a syllable by itself, while the second is the form used following a consonant. Other such variants exist, but are not written:
- For any Pinyin syllable final list above with only one form and beginning with u or i, substitute w or y (respectively) when it appears as a syllable by itself.
- The Pinyin letter ü is written u after all but l and n, because it does not contrast with /u/ except after those two consonants.
- The Pinyin final uo is written o after b, p, m and f.
Interjections such as [ɔ] and [ɛ], which contrast with the word [ǝ], suggest that each must be treated as a separate phoneme. In reality, however, most analyses treat these as special cases operating outside the normal phonemic system (similar to the normal treatment of "hmm", "unh-unh", "shhh!" and other English exclamations that violate usual phonotactic and allophonic rules). Examples are [ɛ] – [ɔ] (e.g. the interjections 喔, 哦 and 噢) – [ɰʌ] (e.g. 饿 "hungry", 鹅 "goose"), [jɛ] (e.g. 夜 "night", 爷 "grandfather") – [jɔ] (e.g. the interjection 哟), [lə] (e.g. 乐 "glad") – [lo] (e.g. the interjection 咯).
It would also be possible to merge /ɨ/ and /i/, which are historically related, since they are also in complementary distribution, provided that the alveolo-palatal series is either left unmerged, or merged with the velars rather than the retroflex or alveolar series. (That is, [t͡ɕi], [t͡sɯ] and [ʈ͡ʂɨ] all exist, but there is neither *[ki] nor *[kɨ], so there is no problem merging both [i]~[ɨ] and [k]~[t͡ɕ] at the same time.) The result is a five-vowel system of /a/, /ə/, /i/, /u/, and /y/.
See syllabic fricative for a discussion of the Chinese phoneme /ɨ/ and its varying pronunciations and transcriptions.
The medials /j, w, ɥ/ can also be merged to the high vowels /i, u, y/ — there is no ambiguity in interpreting a sequence like [jɑʊ̯] as /iɑʊ̯/, and potentially problematic sequences such as */iu/ never occur. This results in a minimal system with 19 consonants and 5 vowels.
An alternative and potentially more abstract system that sometimes appears in the linguistic literature (e.g. in Mantaro Hashimoto and Edwin Pulleyblank) uses the opposite approach of analyzing the vowels /i/, /u/ and /y/ as the surface form of the glides /j, w, ɥ/ combined with a null meta-phoneme Ø. In this system, shown below, there are just two vowel nuclei, /a/ and /ə/; various allophones result from a preceding glide /j, w, ɥ/ (or null) and a coda /i~j, u~w, n, ŋ/ (or null; see erhua for the additional sequences afforded by the rhotic coda /ɻ /). (The minimal vowel /ɨ/ is ascribed to the surface manifestation of all three values being null, e.g. [sɨ] would be analyzed as an underlying syllabic /s/.)
|ə||Ø||ɤ ²||jɛ||wɔ ¹||ɥœ̜|
~ ʊŋ ³
¹ Both pinyin and zhuyin have an additional "o", used after "b p m f", which is distinguished from "uo", used after everything else. "o" is generally put into the first column instead of the third. However, in Beijing pronunciation, these are identical.
² Another way to represent this initial is: [ɰʌ], which reflects Beijing pronunciation.
³ /wɤŋ/ is pronounced [ʊŋ] when it follows an initial.
The sequence [ jɛn] can be considered to be phonemically either /jən/ or /jan/; likewise [ɥɛn] could be either /ɥən/ or /ɥan/. Since [ jɛn] and [ɥɛn] become [ jɐɻ ] and [ɥɐɻ ] with the addition of a suffix /ɻ /, the latter interpretation is generally preferred.
Syllables in Standard Chinese have the maximal form CGVCT, where the first C is the initial consonant; G is one of the glides /j w ɥ/; V is a vowel (or diphthong); the second C is a coda, /n ŋ ɻ / (if diphthongs like ou, ai are analyzed as V) or /n ŋ ɻ j w/ (if not); and T is the tone. In traditional Chinese phonology, C is called the "initial", G the "medial", and VFT the "final" or "rime"; sometimes the medial is considered part of the rime.
Not counting tone distinctions or the rhotic coda, there are some 35 finals in Standard Chinese. They can be seen at:
Tables of all syllables (excluding tone and rhotic coda) are at:
The rhotic coda 
Standard Chinese also uses a rhotic consonant, /ɻ /. This usage is a unique feature of Mandarin dialects, especially the Beijing dialect; other dialects lack this sound.[dubious ] In Chinese, this feature is known as Erhua. There are two cases in which it is used:
- In a small number of words, such as 二 èr "two", 耳 ěr "ear", etc. All of these words are pronounced [ɑɻ ] with no initial consonant.
- As a noun suffix -兒/-儿 -r. The suffix combines with the final, and regular but complex changes occur as a result.
The "r" final must be distinguished from the retroflex consonant written ⟨ri⟩ in pinyin and [ʐ ] in IPA. "The star rode a donkey" in some rhotic English accents, and 我女兒入醫院/我女儿入医院 Wǒ nǚ'ér rù yīyuàn "My daughter entered/enters the hospital" in Standard Chinese, both have a first r pronounced with a relatively lax tongue, and a the second /r/ sounds involving an active retraction of the tongue and contact with the top of the mouth.
In other Mandarin dialects, the rhotic consonant is sometimes replaced by another syllable, such as li, in words that indicate locations. For example, 這兒/这儿 zhèr "here" and 那兒/那儿 nàr "there" become 這裡/这里 zhèli and 那裡/那里 nàli, respectively.
Standard Chinese, like all Chinese dialects, is a tonal language. This means that tones, just like consonants and vowels, are used to distinguish words from each other. Many non-native Chinese speakers have difficulties mastering the tones of each character, but correct tonal pronunciation is essential for intelligibility because of the vast number of words in the language that only differ by tone (i.e. are minimal pairs with respect to tone). Statistically, tones are as important as vowels in Standard Chinese. The following are the 4 tones of Standard Chinese:
|Tone name||Yin Ping||Yang Ping||Shang||Qu|
|Tone letter||˥˥ (55)||˧˥ (35)||˨˩, ˨˩˦ (21, 214)||˥˩ (51)|
|IPA diacritic||á||ǎ||à, a᷉||â|
- First tone, or high-level tone (陰平/阴平 yīnpíng, literal meaning: dark level):
- a steady high sound, as if it were being sung instead of spoken.
- Second tone, or rising tone (陽平/阳平 yángpíng, literal meaning: light level), or more specifically, high-rising:
- is a sound that rises from mid-level tone to high (e.g., What?!)
- Third tone, low or dipping tone (上 shǎng, literal meaning: "rising"):
- has a mid-low to low descent; if at the end of a sentence or before a pause, it is then followed by a rising pitch. Between other tones it may simply be low.
- Fourth tone, falling tone, or high-falling (去 qù, literal meaning: "departing"):
- features a sharp fall from high to low, and is a shorter tone, similar to curt commands. (e.g., Stop!)
Neutral tone 
Also called fifth tone or zeroth tone (in Chinese: 輕聲/轻声 qīng shēng, literal meaning: "light tone"), neutral tone is sometimes thought of as a lack of tone. It usually comes at the end of a word or phrase, and is pronounced in a light and short manner. Because of this characteristic, and because there is no standard rule for whether a syllable has a neutral tone, it is considered analogous to an unstressed syllable. The neutral tone has a large number of allophones: its pitch depends almost entirely on the tone of the preceding syllable. The situation is further complicated by the amount of dialectal variation associated with it; in some regions, notably Taiwan, the neutral tone is relatively uncommon.
Despite many examples of minimal pairs (for example, 要是 and 钥匙, yàoshì if and yàoshi key, respectively), it is sometimes described as something other than a full-fledged tone for technical reasons: some linguists feel that it results from a "spreading out" of the tone on the preceding syllable. This idea is appealing intuitively because without it, the neutral tone needs relatively complex tone sandhi rules to be made sense of; indeed, it would have to have 4 allotones, one for each of the four tones that could precede it. However, the "spreading" theory incompletely characterizes the neutral tone, especially in sequences where more than one neutrally toned syllable are found adjacent.
|Tone of first syllable||Pitch of neutral tone||Example||Pinyin||English meaning|
|1 ˥||˨ (2)||玻璃 (˥.˨)||bōli||glass|
|2 ˧˥||˧ (3)||伯伯 (˧˥.˧)||bóbo||uncle|
|3 ˨˩||˦ (4)||喇叭 (˨˩.˦)||lǎba||horn|
|4 ˥˩||˩ (1)||兔子 (˥˩.˩)||tùzi||rabbit|
Most romanizations represent the tones as diacritics on the vowels (e.g., Hanyu Pinyin, Mandarin Phonetic Symbols II and Tongyong Pinyin). Zhuyin uses diacritics as well. Others, like Wade-Giles, use superscript numbers at the end of each syllable. The tone marks and numbers are rarely used outside of language textbooks: in particular, they are usually absent in public signs, company logos, and so forth. Gwoyeu Romatzyh is a rare example where tones are not represented as special symbols, but using normal letters of the alphabet (although without a one-to-one correspondence).
To listen to the tones, see http://www.wku.edu/~shizhen.gao/Chinese101/pinyin/tones.htm (click on the blue-red yin yang symbol).
Tone sandhi 
Pronunciation also varies with context according to the rules of tone sandhi. The most prominent phenomenon of this kind is when there are two third tones in immediate sequence, in which case the first of them changes to a rising tone, the second tone. In the literature, this contour is often called two-thirds tone or half-third tone, though generally, in Standard Chinese, the "two-thirds tone" is the same as the second tone. If there are three third tones in series, the tone sandhi rules become more complex, and depend on word boundaries, stress, and dialectal variations.
Basic rules of tone sandhi 
- When there are two 3rd tones (˨˩˦) in a row, the first syllable becomes 2nd tone (˧˥), and the second syllable becomes a half-3rd tone (˨˩). The half-3rd tone is a tone that only falls but does not rise.
- ex: 老鼠 (lǎoshǔ) becomes [lɑʊ̯˧˥ʂu˨˩]
- When there are three 3rd tones in a row, things get more complicated.
- If the first word is two syllables, and the second word is one syllable, the first two syllables become 2nd tones, and the last syllable stays 3rd tone:
- If the first word is one syllable, and the second word is two syllables, the first syllable becomes half-3rd tone (˨˩), the second syllable becomes 2nd tone, and the last syllable stays 3rd tone:
- When a 3rd tone is followed by a first, second or fourth tone, or most neutral tone syllables, it usually becomes a half-3rd tone.
- ex: 美妙 (měimiào) becomes [mei̯˨˩mi̯ɑʊ̯˥˩]
Rules for "一" and "不" 
- When in front of a 4th tone syllable, "一" becomes 2nd tone.
- ex: 一定 (yīdìng becomes yídìng [i˧˥tiŋ˥˩])
- When in front of a non-4th tone syllable, "一" becomes 4th tone.
- When "一" falls between two words, it becomes neutral tone.
- ex: 看一看 (kànyīkàn) becomes kànyikàn
- When counting sequentially, and for all other situations "一" retains its root tone value of 1st tone. This includes when 一 is used at the end of a multi-syllable word (regardless of the first tone of the next word), and when 一 is immediately followed by any digit, including another 一; hence 一 also retains its root tone value of 1st tone in both syllables of the word "一一". For instance, 一一对应 is pronounced as yīyīduìyìng.
- When 一 is part of a cardinal number, it is pronounced as 4th tone when before 千 or 百, but in an ordinal number it is pronounced as 1st tone in these contexts.
- "不" becomes 2nd tone only when followed by a 4th tone syllable.
- ex: 不是 (bùshì) becomes búshì [pu˧˥ʂɨ˥˩]
- When "不" comes between two words in a yes-no question, it loses its tone (becomes neutral in tone).
- ex: 是不是 (shìbùshì) becomes shìbushì
Relationship between Middle Chinese and modern tones 
Relationship between Middle Chinese and modern tones:
|Middle Chinese||Tone||Ping (平)||Shang (上)||Qu (去)||Ru (入)|
|Standard Chinese||Tone name||Yin Ping
with no pattern
|to Qu||to Yang Ping|
|Tone contour||55||35||214||51||to 51||to 35|
Word stress 
The stress pattern of Chinese language is made up of three degrees of stress. There are three stress patterns, which commonly occur in the two-syllable compound words:
- Normal Stress + Primary Stress (\ + /)
- 字画儿 zìhuàr
- 音乐 yīnyuè
- 学校 xuéxiào
- 汽车 qìchē
- Primary Stress + Unstressed (/ + o)
- 父亲 fùqin
- 喜欢 xǐhuan
- 东西 dōngxi
- Primary Stress + Normal Stress (/ + \)
- 农村 nóngcūn
- 社会 shèhuì
- 热情 rèqíng
Syllable reduction 
When a syllable is unstressed, it not only loses its tone, but tenuis occlusives such as b d g z j become voiced (in pinyin, bb dd gg zz jj) and the vowel is reduced. When the consonant of the unstressed syllable in a nasal or a fricative, the vowel (or entire rime) may be dropped altogether. For example,
|Full form||Reduced form|
|xǐhuan 'to like'||xǐhuə|
|chūqu 'to go out'||chūqə|
The last example involved assimilation as well, which is seen even in unreduced syllables in quick speech (for example, in guǎmbō for guǎngbō 'broadcast'). The most salient example of assimilation is the exclamatory particle ā, which even has different characters for its assimilated forms:
|Preceding sound||Assimilated form|
|-ng, -ɨ||ā 啊|
|-a, -o, -e, -i, -ü||yā 呀 (from ŋā)|
- Norman, Jerry (1988). Chinese. Cambridge University Press. pp. 140–141. ISBN 978-0-521-29653-3.
- Hashimoto, Mantaro (1970), "Notes on Mandarin Phonology", in Jakobson, Roman; Kawamoto, Shigeo, Studies in General and Oriental Linguistics, Tokyo: TEC, pp. 207–220
- Chao (1934) notes that English sway has a consonant sequence, [sweɪ̯], whereas Mandarin sui has a labialized consonant, [sʷeɪ̯]. That is, the s in sui is pronounced with rounded lips.
- Surendran, Dinoj and Levow, Gina-Anne (2004), "The functional load of tone in Mandarin is as high as that of vowels", Proceedings of the International Conference on Speech Prosody 2004, Nara, Japan, pp. 99–102.
- "上聲 - 教育部重編國語辭典修訂本". 中華民國教育部. 1994. Retrieved 2010-05-15.
- 古代汉语大词典大字本. 北京: 商务印书馆. 2002. p. 1369. ISBN 978-7-100-03515-6.
- Yiya Chen and Yi Xu, Pitch Target of Mandarin Neutral Tone (abstract), presented at the 8th Conference on Laboratory Phonology
- Wang Jialing, The Neutral Tone in Trisyllabic Sequences in Chinese Dialects, Tianjin Normal University, 2004
- A Reference Grammar of Chinese Sentences with Exercises by Henry Hung-Yeh Tiee, University of Arizona Press, 1986, p. XXVI. ISBN 978-0-8165-1166-2.
- Po-ching Yip, 2000. The Chinese lexicon: a comprehensive survey, p 29
- The vowel of si, zi, ci, shi, zhi, chi
Further reading 
- San, Duanmu (2007). The phonology of standard Chinese (2nd ed.). Oxford University Press. ISBN 978-0-19-921579-9.