CJK characters: Difference between revisions
→Characteristics: expand Vietnam |
Ross Monroe (talk | contribs) |
||
Line 41: | Line 41: | ||
==See also== |
==See also== |
||
*[[East Asian cultural sphere]] |
|||
*[[Chinese character encoding]] |
*[[Chinese character encoding]] |
||
*[[Han unification]] |
*[[Han unification]] |
Revision as of 08:21, 3 June 2013
CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.
The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.
Characteristics
These languages all have a shared characteristic: Their writing systems all completely or partly use Chinese characters — Hànzì in Chinese, kanji in Japanese, hanja in Korean, and chữ Hán in Vietnamese. Chinese is written in Chinese characters only and requires approximately 4,000 characters for general literacy although there are up to 40,000 characters for reasonably complete coverage.
Japanese uses fewer characters — general literacy in Japan can be expected with about 2,000 characters (although an educated Japanese should know about 9,000) — together with two syllabaries (hiragana and katakana).
The use of Chinese characters in Korea is becoming increasingly rare altogether, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Hangul is now the native writing system of Korean.
Until the early 20th century, Literary Chinese was the written language of government and scholarship in Vietnam. Popular literature in Vietnamese was written in the Chữ Nôm script, consisting of borrowed Chinese characters together with many characters created locally. By the end of the 1920s both scripts had been replaced by writing in Vietnamese using the Latin-based Vietnamese alphabet.
Encoding
The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit character encodings, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from Unicode up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate — Unicode 5.0 has some 70,000 Han characters — and the requirement by the Chinese government that software in China support the GB18030 character set.
Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible. Unicode has attempted, with some controversy, to unify the character sets in a process known as Han unification.
CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as pinyin, bopomofo, hiragana, katakana and hangul.
CJK character encodings include:
- Big5
- EUC-JP
- EUC-KR
- GB18030 (mandated standard in the People's Republic of China)
- GB2312 (subset and predecessor of GB18030)
- ISO 2022-JP
- KS C 5861
- Shift-JIS
- Unicode encodings
The CJK character sets take up the bulk of the assigned Unicode code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the Han unification process used to map multiple Chinese and Japanese character sets into a single set of unified characters.[citation needed]
All three languages can be written both left-to-right and top-to-bottom, but are usually considered left-to-right scripts when discussing encoding issues.
See also
- East Asian cultural sphere
- Chinese character encoding
- Han unification
- Chinese input methods for computers
- Japanese language and computers
- Korean language and computers
- Input method editor
- Variable-width encoding
- Complex Text Layout languages (CTL)
- CJK strokes
- Horizontal and vertical writing in East Asian scripts
- Graphics tablet
- List of CJK fonts
- Sinoxenic
References
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.
- DeFrancis, John. The Chinese Language: Fact and Fantasy. Honolulu: University of Hawaii Press, 1990. ISBN 0-8248-1068-6.
- Hannas, William C. Asia's Orthographic Dilemma. Honolulu: University of Hawaii Press, 1997. ISBN 0-8248-1892-X (paperback); ISBN 0-8248-1842-3 (hardcover).
- Lemberg, Werner: The CJK package for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18 (1997), No. 3—Proceedings of the 1997 Annual Meeting
- Lunde, Ken. CJKV Information Processing. Sebastopol, Calif.: O'Reilly & Associates, 1998. ISBN 1-56592-224-7.