CJK Unified Ideographs Extension B

From Wikipedia, the free encyclopedia
Jump to: navigation, search
CJK Unified Ideographs Extension B
Range U+20000..U+2A6DF
(42,720 code points)
Plane SIP
Scripts Han
Assigned 42,711 code points
Unused 9 reserved code points
Unicode version history
3.1 42,711 (+42,711)
Note: [1][2]

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese.

The block has dozens of variation sequences defined for standardized variants.[3]

It also has thousands of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4][5] These sequences specify the desired glyph variant for a given Unicode character.

Known Issues[edit]

Other 3 glyphs in Extension B[edit]

In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B (𠅻), U+204AF (𠒯) and U+24CB2 (𤲲). The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones.[6]

Unifiable variants and exact duplicates in Extension B[edit]

Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded.[7] In addition to the deliberate encoding of close glyph variants, six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a de facto disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake:[8]

  • U+34A8 㒨 = U+20457 𠑗 : U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8
  • U+3DB7 㶷 = U+2420E 𤈎 : same glyph shapes
  • U+8641 虁 = U+27144 𧅄 : U+27144 is the same as the Korean-source glyph for U+8641, but it is significantly different from the Chinese Mainland-, Taiwan- and Japan-source glyphs for U+8641
  • U+204F2 𠓲 = U+23515 𣔕 : same glyph shapes, but ordered under different radicals
  • U+249BC 𤦼 = U+249E9 𤧩 : same glyph shapes
  • U+24BD2 𤯒 = U+2A415 𪐕 : same glyph shapes, but ordered under different radicals
  • U+26842 𦡂 = U+26866 𦡦 : same glyph shapes
  • U+FA23 﨣 = U+27EAF 𧺯 : same glyph shapes (U+FA23 﨣 is a unified CJK ideograph, despite its name "CJK COMPATIBILITY IDEOGRAPH-FA23.")

History[edit]

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Unified Ideographs Extension B block:

Version Final code points[a] Count L2 ID WG2 ID IRG ID Document
3.1 U+20000..2A6D6 42,711 L2/99-239 Addition of three hundred and fourteen KANJIs (from JIS X0213), 1999-07-15 
L2/99-310 Addition of three hundred and thirteen KANJIs (from JIS X0213), 1999-08-23 
L2/99-335 N2109 N674 Zhoucai, Zhang (1999-09-03), SuperCJK, version 9.0 with Kangxi and HYD data 
L2/99-336 N2105 N675 CJK Unified Ideographs Extension B WD 6.0, 1999-09-03 
L2/99-316 Whistler, Ken (1999-09-13), Comments on JCS proposal 
L2/99-312 excerpt of usages and sources of proposed KANJIs in contemporary Japanese, 1999-10-06 
L2/99-366 Suignard, Michel (1999-11-24), Text for CD ballot of ISO/IEC 10646 part 2 
L2/99-366.1 Cover page for N3393, 1999-11-24 
L2/99-366.2 Suignard, Michel (1999-11-24), Text of CD 10646-2 
L2/99-366.3 Suignard, Michel (1999-11-24), CJK Ext. B pages 001-100 
L2/99-366.4 Suignard, Michel (1999-11-24), CJK Ext. B pages 101-200 
L2/99-366.5 Suignard, Michel (1999-11-24), CJK Ext. B pages 201-300 
L2/99-366.6 Suignard, Michel (1999-11-24), CJK Ext. B pages 301-335 
L2/99-366.7 Suignard, Michel (1999-11-24), Special Purpose Plane and Annexes 
L2/99-366.8 Suignard, Michel (1999-11-24), Mapping of CJK Ext. B characters 
L2/99-385 N2144 N713R Jenkins, John (1999-12-08), Clarification of the Non-Cognate Rule 
L2/00-021R ISO CD 10646 Part-2 vote -- A proposal to move JIS X 0213 Kanji characters on Extension-B into BMP, 2000-01-21 
L2/00-030 Enomoto, Yoshi (2000-01-31), Background of the proposal (for encoding of 302 ideographs from JIS X 0213) 
L2/00-036 Umamaheswaran, V. S.; Sargent, Murray (2000-02-03), Expert contribution on the placement of additional unified ideographs from JIS X0213, HK, and Korea 
L2/01-026 N2298 N758 CJK Unified Ideographs Extension B, PreDIS R1 For ISO/IEC DIS 10646-2:2000, 2000-11-21 
L2/01-027 N2299 N759 Zhoucai, Zhang (2000-11-21), SuperCJK 11.1, A Super Set of Unified CJK Ideographs and Its Extension A & B 
L2/01-136 N2334 Sato, T. K. (2001-03-28), Notification of an error and request for a correction regarding mapping information for a particular JIS X 0213 character in CJK UNIFIED IDEOGRAPHS EXTENSION-B 
L2/01-163 N2347 N785 CJK Unified Ideographs Extension B PreIS For ISO/IEC 10646-2:2000, 2001-03-30 
L2/01-162 N2349 N787 Zhoucai, Zhang (2001-04-02), Clarification On Versions of CJK Unified Ideographs Extension B As Well As SuperCJK 
L2/02-122 N2427 Ksar, Mike (2002-03-18), Proposal to add 1 Hanja code of D P R of Korea into 10646-2:2001 
L2/02-156 N2427 Proposal to add 1 Hanja code of D P R of Korea into ISO/IEC 10646-2:2001 [duplicate of L2/02-122], 2002-03-18 
L2/02-201 N2448 N924 Error Correction, 2002-05-08 
L2/02-416 N2518 Proposal to add 2 hanja codes of D P R of Korea into 10646-2:2001, 2002-11-01 
L2/03-017 Late DPRK Comments on SC 2 N 3625, 10646-2: 2001/FPDAM 1, 2002-12-09 
L2/03-287 Cook, Richard (2003-08-24), 16 UniHan.txt errors 
L2/03-301 Cook, Richard (2003-08-27), 24 more UniHan.txt errors 
L2/03-311 West, Andrew (2003-09-17), Unicode 4.0.1 Beta Review, comments from Andrew C. West 
L2/03-399 Fok, Anthony (2003-10-13), Unihan reported errors / changes re kHKSCS entries 
L2/03-398 Nguyen, D. (2003-10-29), Unihan reported errors / changes re kCowles 
L2/03-453 Minutes of the Editorial Group Ad Hoc Discussion, 2003-12-17 
L2/04-008 N2695 N1026 China's confirmation on fonts for CJK_B 21E2D and 21E45, 2004-01-05 
L2/04-208 N2774R N1064 Proposal to add 6 KP source references to existing CJK Unified Ideographs, 2004-05-25 
L2/04-281 N2830 Suignard, Michel (2004-06-23), CJK Ideograph source visual references information 
L2/04-417 Cook, Richard (2004-11-18), Extension B font versioning: preliminary work 
L2/05-022 Cook, Richard (2005-01-25), Extension B font versioning: follow-up report, part 1 [text] 
L2/05-023 Cook, Richard (2005-01-25), Extension B font versioning: follow-up report, part 2 [tables] 
L2/07-208 N3285 Proposal to replace 11 KP source references to existing ISO/IEC 10646:2003, 2007-07-18 
L2/08-234 N1406 Cook, Richard; Bishop, Thomas; Lunde, Ken (2008-06-06), Han Unification Issues 
L2/08-310 Cook, Richard (2008-08-12), Fonts for Extension B and C and IRG 
L2/10-215 Lunde, Ken (2010-06-22), "Hanyo-Denshi" IVD Collection (PRI 167) to Adobe-Japan1-6 Mapping Table 
L2/11-243 N4111 Sources for Orphaned CJK Ideographs, 2011-06-14 
L2/11-254 Constable, Peter (2011-06-20), UTC Liaison Report from WG2 
L2/14-260 Suignard, Michel (2014-10-23), CJK chart and source references update 
  1. ^ Proposed code points and characters names may differ from final code points and names

See also[edit]

References[edit]

  1. ^ "Unicode character database". The Unicode Standard. Retrieved 2016-07-09. 
  2. ^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09. 
  3. ^ "Unicode Character Database: Standardized Variation Sequences". The Unicode Consortium. 
  4. ^ "Ideographic Variation Database". Unicode Consortium. 
  5. ^ "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium. 
  6. ^ Eiso Chan (陈永聪), Comments on four error glyphs on CJK Unified Ideographs Ext B & E.[1]
  7. ^ unifiable glyph variants
  8. ^ Cook, Richard (6 October 2003). "Defect Report on Duplicate Encoded CJK Forms" (PDF). ISO/IEC JTC1/SC2/WG2. Retrieved 2012-03-28.