CNS 11643
Alias(es) | CSIC (Chinese Standard Interchange Code) |
---|---|
Language(s) | Traditional Chinese |
Standard | CNS 11643 |
Classification | ISO 2022, DBCS, CJK encoding |
Encoding formats |
|
Other related encoding(s) | Big5, CCCII |
The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC[1] (Chinese: 中文標準交換碼), is officially the standard character set of the Republic of China. In practice, variants of the related Big5 character set are de facto standard.
CNS 11643 is designed to conform to ISO 2022. It contains 16 planes, so the maximum possible number of encodable characters is 16×94×94 = 141376. Planes 1 through 7 are defined by the standard; since 2007, planes 10 through 15 have also been defined by the standard.[2]: 115–122 Prior to this, planes 12 to 15 (35344 code points) were specifically designated for user-defined characters.[citation needed] Unlike CCCII, the encoding of variant characters in CNS 11643 is not related.
EUC-TW is a encoded representation of CNS 11643 and ASCII in Extended Unix Code (EUC) form. Other encodings capable of representing certain CSIC planes include ISO-2022-CN (planes 1 and 2) and ISO-2022-CN-EXT (planes 1 through 7).
History
The first edition of the standard was published in 1986, and included planes 1 and 2, deriving from levels 1 and 2 of Big5, with some re-ordering due to corrected stroke counts, two duplicate characters being omitted, and the addition of 213 classical radicals. Extensions to the standard were subsequently published in 1988 (6319 characters, occupying plane 14) and 1990 (7169 characters, occupying plane 15).[2]: 115–122
When the Unicode CJK Unified Ideographs set was being compiled for Unicode 1.1, the national bodies submitted character sets to the CJK Joint Research Group for inclusion. The version of CNS 11643 submitted included the plane 14 extension, in addition to further desired characters appended to plane 14 (after 68-21, the last used code point in the standard version of the extension).[2]: 179–180
In the second edition of the standard, published in 1992, a much larger collection of hanzi was defined across seven planes. A subset of the 1988 plane 14 extension, including the 6148 code points 01-01 through 66-38, became plane 3 (with the remaining 171 characters, code points 66-39 through 68-21, being instead distributed amongst plane 4). The plane 15 extension was not included, although 338 of its characters were included amongst planes 4 through 7.[2]: 115–122
The third edition of the standard, published in 2007, added the Euro sign, ideographic zero, kana and extensions to the existing bopomofo and Roman alphabet support to plane 1. It introduced planes 10 through 14, containing additional hanzi, and incorporated the existing plane 15 extension into the standard itself (with gaps left where the characters already existed in planes 4 through 7). It also added 128 further hanzi to plane 3, starting at code point 68-40.[2]: 115–122
As of 2017[update], there are several thousand CNS 11643 characters with no corresponding Unicode character, mostly in planes 10 through 14; these are mapped to the Unicode Supplementary Private Use Area.[3]
Relationship to Big5
Levels 1 and 2 of the Big5 encoding correspond mostly to CNS 11643 planes 1 and 2, respectively, with occasional differences in order. They can be mapped using a list of ranges.[4][5]
The Big5-2003 variant of Big5 is defined as a partial encoding of CNS 11643.
References
- This page is based on the information on the CNS official web site.
- ^ ECMA (1993-01-21). Chinese Standard Interchange Code (CSIC) - Set 1 (PDF). ITSCJ/IPSJ. ISO-IR-171.
- ^ a b c d e Lunde, Ken (2008). "3. Character Set Standards". CJKV Information Processing (2nd ed.). O'Reilly Media. ISBN 9780596514471.
- ^ "CNS 11643 in Unicode's Supplementary Private Use Area". [chinese mac]. Council on East Asian Studies at Yale University.
- ^ Lunde, Ken (1995-12-18). "4.3: CJK Character Set Compatibility Issues - Chinese (Taiwan)". CJK.INF Version 1.9.
- ^ Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "RFC 1922: Chinese Character Encoding for Internet Messages". Requests for Comments. IETF.
External links
- CNS 11643 official web site
- Current CNS 11643 open data, including mapping data
- Unicode mappings for other CNS 11643 versions/editions/extensions:
- Unicode consortium mappings for CNS 11643-1986: planes 1 and 2, plus the 1988 plane 14 with extensions. Uses a single prefixed hex digit to indicate plane.
- CNS-11643-1992 in International Components for Unicode (ICU); uses prefixed 0x81 through 0x8F to indicate plane:
- Older version: planes 1 through 7, plus the plane 15 extension as plane 9.
- Intermediate version: planes 1 through 7, for internal use by the ISO-2022-CN-EXT codec.
- Current version: includes only planes 1 and 2, for internal use by the ISO-2022-CN codec.
- EUC-TW-2014 in ICU: standard assignments for planes 1 through 7 and 15, and IBM corporate assignments in planes 12 and 13
- ISO-IR registered CNS-11643 code charts: