Chinese whole characters

From Wikipedia, the free encyclopedia

A Chinese whole character, or whole Chinese character (Pinyin: hànzì zhěngzì; Traditional Chinese: 漢字整字; Simplified Chinese: 汉字整字), is a complete Chinese character. It lies at the final level of the stroke-component-character Chinese character composition.[1]

According to their structures, Chinese characters can be divided into undecomposable characters (独体字) and decomposable characters (合体字). An undecomposable character is formed by one primitive component and is also called a single-component character, a decomposable character can be decomposed into two or more components and is also called a multi-component character. [2] [3]

Undecomposable characters[edit]

Definition[edit]

An undecomposable character is directly formed by strokes, can not be decomposed into smaller components, though may be a component of a decomposable character. [2] For example, 人 is an undecomposable characters formed by strokes ㇓ and ㇏, and is used to form character 丛.

Lists of undecomposable characters[edit]

The following are some lists of undecomposable characters created by different authors.

"Chinese Character Information Dictionary" (漢字信息字典)[4] contains a total of 7,785 standardized characters in the China Mainland. According to static statistics, there are 323 undecomposable characters, accounting for 4.149%. According to dynamic statistics, undecomposable characters account for 25.910% of the corpus. Because many single-component characters are frequently-used characters. The list of undecomposable characters is as follows (in Pinyin order):

凹八巴白百办半贝本匕币必卞丙秉卜不步才册叉产长厂车臣辰成丞承尺彳斥赤虫丑川巛串垂匆寸大歹丹刀氐电刁丁丢东斗厾不(重)儿而耳发乏凡方飞非丰夫弗甫钆玍丐干甘戈革个艮更工弓瓜广龟果亥禾乎户互幻奂火丌乩及几己夹甲戋兼柬见孑巾斤今堇井九久韭旧臼巨孓开孔口来乐耒礼里力吏隶良两了〇令龙甪乱马买毛矛么门米芈丏免面灭民皿末母木目乃内年廿乜牛农女丬乓皮匹片氕乒平七妻气千羌且丘求虬曲犬冉人壬刃日乳入卅伞丧山上勺少申身甚升生尸失虱十石史豕士氏世事手书术戍束甩水厶司巳肃太天头凸土兔彖屯乇瓦丸万亡王韦为囗卫未我乌无毋五兀勿戊夕西习系下乡象小心戌血丫牙轧亚严央羊么夭也业页曳一夷乙已以乂义弋亦尹引永用尤尢由酉又于予臾禺雨禹玉聿曰月再扎札乍丈爪兆争正之止豸中重舟州朱竹主专隹子自 [a]

Su [5] segmented 7,000 commonly-used characters and obtained 233 undecomposable characters, accounting for 3.4%. The list of undecomposable characters is as follows (in stroke-based order):

一乙二十丁厂七卜八人入乂儿九匕几刁了乃刀力又乜三干亍于亏士土工才下寸丈大兀万弋上小口山巾千川彳个么久丸夕凡及广亡门丫义之尸已巳弓己卫孑子孓也女飞刃习叉乡么丰王井开夫天无韦专丐廿木五卅不太犬歹尤车牙屯戈互瓦止少曰日中内水手牛毛气壬升夭长片币爪乏月氏勿丹火为户心尹尺夬丑爿巴书毋玉未末戋正甘世本术丙龙戊平灭东凸业目且甲申电田由央史冉皿凹四生失矢禾丘白斥瓜乎用甩氐乐册主半必永弗出母耒耳亚臣吏再西夹夷曳虫曲朱丢乒乓臼自血甪舟米州聿严甫更束两酉来里串我身系事雨果垂秉臾柬韭禺重禹

In the "Specification of the Undecomposable Characters Commonly Used in the Modern Chinese" (现代常用独体字规范), 256 modern commonly used undecomposable characters have been identified within the scope of modern Chinese characters, forming the "List of Modern Commonly Used Undecomposable Characters". [6] The list of undecomposable characters is as follows (in stroke-based order):

一乙二十丁厂七卜八人入儿匕几九刁了刀力乃又三干于工土士才下寸大丈与万上小口山巾千川个歹久么凡丸及广亡门丫义之尸己已巳弓子卫也女刃飞习叉马乡丰王开井天夫无云专丐木五不犬太歹尤车巨牙屯戈互瓦止少曰日中贝内水见午牛手气毛壬升夭长片斤爪父月氏勿丹鸟六文方火为斗户心尺丑巴办予书玉未末击正甘世本术丙石戊龙平东卡凸业木且甲申电田由史央冉皿凹四生矢失乍禾丘白斥瓜乎用甩乐匆册鸟主立半头必永民弗出矛母耳亚臣吏再西百而页夹夷虫曲肉年朱臼自血卤舟亦衣产亥羊米州农严求甫更束两酉来卤里串我身囱言羌弟事雨果垂秉肃隶承革柬面重鬼禹首兼象鼠

The "List of Commonly Used Character Components" in the "Specification of Common Modern Chinese Character Components and Component Names (现代常用字部件及部件名称规范)" includes a total of 311 commonly used character components, i.e., undecompasable characters. [7] The list of undecomposable characters is as follows (in Pinyin order):

凹八巴白百办半卑贝本匕必丙秉卜不才册叉产长厂车臣辰承尺斥赤虫丑出川串垂匆囱寸大歹丹单刀弟电刁丁鼎东兜斗豆儿而耳二发凡方飞非丰凤夫弗甫父丐干甘高戈革个更工弓谷瓜广龟鬼果亥禾黑后乎互户黄火击及几己夹甲兼柬见角巾斤今金京井九久韭臼巨具卡开口来老乐里力立吏丽隶两了六龙卤鹿卵仑马毛矛卯么门米免面民皿末母木目乃南内年鸟牛农女乓皮片乒平七妻其气千欠且丘求曲去犬冉人壬刃日肉入三伞色山上勺少舌申身升生尸失十石食史矢士氏示世事手首书鼠术束甩永司丝巳四肃太天田头凸土屯瓦丸万亡王卫为未文我乌无五午勿戊夕西习下乡向象小心辛戌穴血熏丫牙亚严言央羊夭也业页一衣夷乙已义亦庸永用尤由酉又于鱼与予雨禹玉曰月云再乍丈爪兆争正之直止至中重舟州朱竹主专子自

Based on the above experimental results, it is estimated that the number of undecomposable characters in modern Chinese characters approximately account for 4%. [8]

Since each experimenting family has slight different understandings of components and often uses different character sets, there are differences in the number of undecomposable characters obtained. But generally speaking the results are quite similar, ranging between over 200 to over 300.

Decomposable characters[edit]

Definition[edit]

A decomposable character can be decomposed into more than one component. For example, "字" (character) is formed by two components (宀+子).

There are two frequently-used modes of component combination in the study of Chinese character structures: first-level component combination and primitive component combination.[9]

First-level component combination[edit]

The first-level component combination mode or pattern is what people often call the structures of Chinese characters. According to this analysis, the structures of decomposable characters can be divided into 4 major categories and 13 subcategories:[10][11]

Left to right structure[edit]

  • Left to right (⿰, 2FF0 [b]), for example: 部, 件, 結 and 構.
  • Left to middle and right (⿲, 2FF2): 衡, 班 and 辯.

Above to below structure[edit]

  • Above to below (⿱, 2FF1): 要, 思 and 想.
  • Above to middle and below (⿳, 2FF3): 鼻, 曼 and 率.

Surrounding structure[edit]

Full surround :

  • Surrounded from four sides (⿴, 2FF4): 圍, 國 and 囪

Surrounded from three sides

  • Surround from above (⿵, 2FF5): 問, 同 and 風
  • Surround from below (⿶, 2FF6): 凶, 画 and 函
  • Surround from left (⿷, 2FF7): 匡, 匠 and 匣

Surrounded from two sides

  • Surround from upper left (⿸, 2FF8): 廣, 居 and 病.
  • Surround from upper right (⿹, 2FF9): 句, 可 and 氧.
  • Surround from lower left (⿺, 2FFA): 這, 建 and 題.
  • Surround from lower right (N/A):斗 and 头.

Overlaid structure[edit]

  • Overlaid (⿻, 2FFB): 巫, 爽 and 承.

Chinese Character Distribution by Structures[edit]

The following data is excerpted from "Chinese Character Information Dictionary", with 7,785 Mainland Standard Chinese Character. [4]

Chinese Character Distribution by structures
Structure Characters Character % Character occurrences [c] Occurrence %
Undecomposable 323 4.149 5611317 25.910
Upper to lower 1643 21.105 4189687 19.346
Left to right 5055 64.933 8682108 40.091
Surround 715 19.184 2882097 13.308
Overlaid 49 0.629 291369 1.345
Total 7785 100 21656578 100

List of characters in nested structure[edit]

According to statistics from the "Chinese Character Information Dictionary" ,[12] there are a total of 49 characters of overlaid (or nested, including fully surrounded) structure among the 7,785 mainland standard characters in the dictionary:

哀褒乘囱囤固国裹回困圃囚圈衰爽四田图团围巫因幽园圆衷噩囟胤兖袤亵裒囝囡囵囫囹囿圄圊圉圜豳囮囷奭圐㘥

If the full surrounded characters are moved to the surrounding category, the overlaid characters will be even less.

Primitive component combination[edit]

According to the planar analysis by primitive components, Chinese character structures include the following modes or patterns:[13]

  • A. For characters composed of two primitive components, there are 9 different structures, as shown by the following example characters: 吕认压达勾问区凶团.
  • B. For characters composed of three components, there are 21 different structures, such as: 荣花型培树缠抛挺润抠捆部庶厢逞逊闾圄幽乖巫.
  • C. For characters composed of four components, there are 20 different structures, such as: 营蕊蓝寤嫠筐辔椁摄燃游榧额韶欧剩腐遮阔匿.
  • D. For characters composed of five components, there are 20 different structures, such as: 赢蒿膏寝蘧嚣篮樊搞澡缀渤漉髂齁敲酃戳魔噩.
  • E. For characters composed of six components, there are 10 different structures, such as: 臀翳麓瀛灌骥歌豁豌衢.
  • F. For characters composed of seven components, there are 3 different structures, such as: 戆麟饕.
  • G. For characters composed of eight components, there is 1 structure, such as: 齉.
  • H. For characters composed of nine components, there is 1 structure, such as: 懿.

The level to which Chinese character components should be divided must be determined based on specific needs. For example, Chinese character teaching often uses a coarser level of analysis in order to be concise, while component-encoding Chinese character input methods often use relatively detailed analysis in order to reduce coding elements. [14]

Chinese character distribution by numbers of components[edit]

The following data is excerpted from "Chinese Character Information Dictionary". The components here refer to primitive components. [15]

Chinese Character Distribution by Numbers of Components
Components Characters Character % Character occurrences Occurrence %
1 323 4.149 5611317 25.910
2 2650 34.040 10191803 47.061
3 3139 40.321 4652330 21.482
4 1276 16.391 1046913 4.834
5 323 4.149 142005 0.656
6 70 0.899 11192 0.052
7 3 0.038 1017 0.005
8 1 0.013 1 0.002
total 7785 100 21656578 100

The static distribution is mainly concentrated in the number of components 2, 3, and 4; the dynamic distribution is mainly concentrated in the number of components 1, 2, and 3. In both dynamic and static distribution statistics, more than 99% of the characters have less than 5 primitive components. Note that the static component counts for component numbers 1 and 5 are the same, but their dynamic component counts are very different.

Chinese character fonts[edit]

Fonts[edit]

The first four characters of Thousand Character Classic in different type and script styles. From right to left: seal script, clerical script, regular script, Ming, and sans-serif.

The popular fonts of modern Chinese characters include Song or Ming (宋體, 明體), FangSong (仿宋體), Kai (regular, 楷體), Li (clerical, 隸體), Hei (black, sans-serif, 黑體) and Wei (魏體).[16]

The official standard fonts include

Font sizes[edit]

Internationally, font sizes are generally measured by "points". In China, in addition to the "points" measure system, a unique "number" system is also used for Chinese characters. For example, the simplified Chinese version of MS Word allows setting font sizes by points or by numbers.[21]

The point system[edit]

After nearly three hundred years of development and improvement, the most influential writing point standards in the world now include the Didot point system in continental Europe (one point is approximately 0.3759 mm) and the Anglo-American point system (one point is approximately 0.3515 mm). China uses the latter point system. The available point values on MS Word are all numbers between 1 point and 1638 points that are divisible by 0.5, that is, the set {1, 1.5, 2, 2.5, ..., 1637, 1637.5, 1638}. These regulations can be verified directly on the computer. [21]

The number system[edit]

The font size options provided by the simplified Chinese version of Windows and Word are arranged in ascending order of font sizes:

No. 8 (八号), No. 7 (七号), Small No. 6 (小六号), No. 6 (六号), Small No. 5 (小五号), No. 5 (五号), Small No. 4 (小四号), No. 4 (四号), Small No. 3 (小三号), No. 3 (三号), Small No. 2 (小二号), No. 2 (二号), small No. 1 (小一号), No. 1 (一号), Small initial number (小初号), Initial number (初号).

"No. 8" (or size 8) is the smallest, equivalent to 5 points (British and American system), and the font height is about 1.757mm; "Initial number" (or size A) is the largest, equivalent to 42 points, and the font height is about 14.761mm.

Number-point correspondence[edit]

The following is a Chinese character font size "number-point" corresponding table created by Dr. Zhang. [21]

Chinese character font size "number-point" corresponding table
Size number Chinese name Points
8 八号 5
7 七号 5.5
small 6 小六号 6.5
6 六号 7.5
small 5 小五号 9
5 五号 10.5
small 4 小四号 12
4 四号 14
small 3 小三号 15
3 三号 16
small 2 小二号 18
2 二号 22
small 1 小一号 24
1 一号 26
small A 小初号 36
A 初号 42

See also[edit]

References[edit]

  1. ^ Su 2014, p. 74.
  2. ^ a b National Language Commission 2009a, p. 1.
  3. ^ Peking University 2004, p. 148.
  4. ^ a b Li 1988, p. 1071.
  5. ^ Su 2014, pp. 95–96.
  6. ^ National Language Commission 2009a, p. 2-3.
  7. ^ National Language Commission 2009b.
  8. ^ Su 2014, p. 96.
  9. ^ Su 2014, p. 98.
  10. ^ Su 2014, pp. 98–99.
  11. ^ https://www.unicode.org/charts/PDF/U2FF0.pdf
  12. ^ Li 1988, p. 1072.
  13. ^ Fu 1999, pp. 39–41.
  14. ^ Su 2014, p. 89.
  15. ^ Li 1988, p. 1010.
  16. ^ Li 2013, p. 62.
  17. ^ 国务院关于公布《通用规范汉字表》的通知. Gov.cn (in Chinese). State Council of the People's Republic of China. 5 June 2013.
  18. ^ https://zh.wikipedia.org/w/index.php?title=常用國字標準字體表&variant=zh-cn
  19. ^ http://www.edbchinese.hk/lexlist_ch/
  20. ^ https://www.unicode.org/charts/PDF/U4E00.pdf
  21. ^ a b c Zhang 2006.

Works cited[edit]

  • Fu, Yonghe (傅永和) (1999). 中文信息处理 (Chinese Information Processing) (in Chinese) (3rd ed.). Guangzhou: 广东教育出版社 (Guangdong Education Press). p. 84. ISBN 9-787540-640804.
  • Li, Dasui (李大遂) (2013). 简明实用汉字学 (Concise and Practical Chinese Characters) (in Chinese) (3rd ed.). Beijing: Peking University Press. ISBN 978-7-301-21958-4.
  • Li, Gongyi (李公宜,劉如水 (主編)) (1988). 漢字信息字典 (Chinese Character Information Dictionary) (in Chinese). Beijing: 科学出版社 (Science Press). ISBN 7-03-000862-6.
  • National Language Commission, Ministry of Education, China (2009a). Specification of the Undecomposable Characters Commonly Used in the Modern Chinese (现代常用独体字规范) (PDF). Beijing: National Language Commission. Retrieved September 8, 2023.{{cite book}}: CS1 maint: multiple names: authors list (link)
  • National Language Commission, Ministry of Education, China (2009b). Specification of Common Modern Chinese Character Components and Component Names ( 现代常用字部件及部件名称规范) (PDF). Beining: National Language Commission. Retrieved 3 September 2023.{{cite book}}: CS1 maint: multiple names: authors list (link)
  • Peking University, Modern Chinese Language Teaching and Research Office (2004). Modern Chinese (现代汉语) (in Chinese). Beijing: Commercial Press. ISBN 7-100-00940-5.
  • Su, Peicheng (苏培成) (2014). 现代汉字学纲要 (Essentials of Modern Chinese Characters) (in Chinese) (3rd ed.). Beijing: 商务印书馆 (Commercial Press). ISBN 978-7-100-10440-1.
  • Zhang, Xiaoheng (张小衡) (2006). "The Number, Point and Metric Systems of Font Size (字形的"号制""点制"与"米制")". Computer Engineering and Applications (计算机工程与应用). 42 (2006) (10): 175–177 & p 215.

Notes[edit]

  1. ^ Here it is stipulated that a component in a multi-component character has more than one stroke
  2. ^ Unicode 2FF0, IDC (Ideographic description character) LEFT TO RIGHT
  3. ^ in a corpus of 21,656,578 characters

External links[edit]