Jump to content

CJK Unified Ideographs Extension B

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 139.193.106.224 (talk) at 08:01, 14 March 2020 (→‎Known issues). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

CJK Unified Ideographs Extension B
RangeU+20000..U+2A6DF
(42,720 code points)
PlaneSIP
ScriptsHan
Assigned42,718 code points
Unused2 reserved code points
Unicode version history
3.1 (2001)42,711 (+42,711)
13.0 (2020)42,718 (+7)
Unicode documentation
Code chart ∣ Web page
Note: [1][2]

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese.

The block has dozens of variation sequences defined for standardized variants.[3]

It also has thousands of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4][5] These sequences specify the desired glyph variant for a given Unicode character.

It is the only CJK Unified Ideographs Extension block with a UCS2003 source identifier. Since Extension B contained too many characters, the original code charts were produced with a single glyph for all regions. The glyphs were designed by Beijing Zhongyi Electronic Ltd.. After the introduction of multi-column code charts, the original glyphs were retained under the UCS2003 source identifier. The glyphs are packaged in the "SimSun-ExtB" font distributed with the Simplified Chinese versions of Windows, and do not adhere to the glyphs for the Mainland China region.

Known issues

Other 3 glyphs in Extension B

In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B (𠅻), U+204AF (𠒯) and U+24CB2 (𤲲). The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones.[6]

Unifiable variants and exact duplicates in Extension B

Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded.[7] In addition to the deliberate encoding of close glyph variants, six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a de facto disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake:[8]

  • U+34A8 㒨 = U+20457 𠑗 : U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8
  • U+3DB7 㶷 = U+2420E 𤈎 : same glyph shapes
  • U+8641 虁 = U+27144 𧅄 : U+27144 is the same as the Korean-source glyph for U+8641, but it is significantly different from the Chinese Mainland-, Taiwan- and Japan-source glyphs for U+8641
  • U+204F2 𠓲 = U+23515 𣔕 : same glyph shapes, but ordered under different radicals
  • U+249BC 𤦼 = U+249E9 𤧩 : same glyph shapes
  • U+24BD2 𤯒 = U+2A415 𪐕 : same glyph shapes, but ordered under different radicals
  • U+26842 𦡂 = U+26866 𦡦 : same glyph shapes
  • U+FA23 﨣 = U+27EAF 𧺯 : same glyph shapes (U+FA23 﨣 is a unified CJK ideograph, despite its name "CJK COMPATIBILITY IDEOGRAPH-FA23.")

History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Unified Ideographs Extension B block:

Version Final code points[a] Count L2 ID WG2 ID IRG ID Document
3.1 U+20000..2A6D6 42,711 L2/98-260 Ng, Nelson; Kung, Michael (1998-05-26), "CJK UNIFIED IDEOGRAPHS EXTENSION B", Report on IRG meeting #11
L2/99-239 Addition of three hundred and fourteen KANJIs (from JIS X0213), 1999-07-15
L2/99-310 Addition of three hundred and thirteen KANJIs (from JIS X0213), 1999-08-23
L2/99-335 N2109 N674 Zhang, Zhoucai (1999-09-03), SuperCJK, version 9.0 with Kangxi and HYD data
L2/99-336 N2105 N675 CJK Unified Ideographs Extension B WD 6.0, 1999-09-03
L2/99-316 Whistler, Ken (1999-09-13), Comments on JCS proposal
L2/99-312 excerpt of usages and sources of proposed KANJIs in contemporary Japanese, 1999-10-06
L2/99-366 Suignard, Michel (1999-11-24), Text for CD ballot of ISO/IEC 10646 part 2
L2/99-366.1 Cover page for N3393, 1999-11-24
L2/99-366.2 Suignard, Michel (1999-11-24), Text of CD 10646-2
L2/99-366.3 Suignard, Michel (1999-11-24), CJK Ext. B pages 001-100
L2/99-366.4 Suignard, Michel (1999-11-24), CJK Ext. B pages 101-200
L2/99-366.5 Suignard, Michel (1999-11-24), CJK Ext. B pages 201-300
L2/99-366.6 Suignard, Michel (1999-11-24), CJK Ext. B pages 301-335
L2/99-366.7 Suignard, Michel (1999-11-24), Special Purpose Plane and Annexes
L2/99-366.8 Suignard, Michel (1999-11-24), Mapping of CJK Ext. B characters
L2/99-385 N2144 N713R Jenkins, John (1999-12-08), Clarification of the Non-Cognate Rule
L2/00-010 N2103 Umamaheswaran, V. S. (2000-01-05), "10.3", Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13--16
L2/00-021R (pdf, rtf) ISO CD 10646 Part-2 vote -- A proposal to move JIS X 0213 Kanji characters on Extension-B into BMP, 2000-01-21
L2/00-030 Enomoto, Yoshi (2000-01-31), Background of the proposal (for encoding of 302 ideographs from JIS X 0213)
L2/00-036 Umamaheswaran, V. S.; Sargent, Murray (2000-02-03), Expert contribution on the placement of additional unified ideographs from JIS X0213, HK, and Korea
L2/01-026 (pdf, doc) N2298 N758 CJK Unified Ideographs Extension B, PreDIS R1 For ISO/IEC DIS 10646-2:2000, 2000-11-21
L2/01-136 N2334 (pdf, doc) Sato, T. K. (2001-03-28), Notification of an error and request for a correction regarding mapping information for a particular JIS X 0213 character in CJK UNIFIED IDEOGRAPHS EXTENSION-B
L2/01-163 N2347 N785 CJK Unified Ideographs Extension B PreIS For ISO/IEC 10646-2:2000, 2001-03-30
L2/01-162 N2349 (pdf, doc) N787 Zhang, Zhoucai (2001-04-02), Clarification On Versions of CJK Unified Ideographs Extension B As Well As SuperCJK
L2/02-122 N2427 Ksar, Mike (2002-03-18), Proposal to add 1 Hanja code of D P R of Korea into 10646-2:2001
L2/02-201 N2448 N924 Error Correction, 2002-05-08
L2/02-416 N2518 Proposal to add 2 hanja codes of D P R of Korea into 10646-2:2001, 2002-11-01
L2/03-017 Late DPRK Comments on SC 2 N 3625, 10646-2: 2001/FPDAM 1, 2002-12-09
L2/03-287 Cook, Richard (2003-08-24), 16 UniHan.txt errors
L2/03-301 Cook, Richard (2003-08-27), 24 more UniHan.txt errors
L2/03-311 West, Andrew (2003-09-17), Unicode 4.0.1 Beta Review, comments from Andrew C. West
L2/03-399 Fok, Anthony (2003-10-13), Unihan reported errors / changes re kHKSCS entries
L2/03-398 Nguyen, D. (2003-10-29), Unihan reported errors / changes re kCowles
L2/03-453 Minutes of the Editorial Group Ad Hoc Discussion, 2003-12-17
L2/04-008 N2695 N1026 China's confirmation on fonts for CJK_B 21E2D and 21E45, 2004-01-05
L2/04-208 N2774R N1064 Proposal to add 6 KP source references to existing CJK Unified Ideographs, 2004-05-25
L2/04-281 N2830 Suignard, Michel (2004-06-23), CJK Ideograph source visual references information
L2/04-417 Cook, Richard (2004-11-18), Extension B font versioning: preliminary work
L2/05-022 Cook, Richard (2005-01-25), Extension B font versioning: follow-up report, part 1 [text]
L2/05-023 Cook, Richard (2005-01-25), Extension B font versioning: follow-up report, part 2 [tables]
N3353 (pdf, doc) Umamaheswaran, V. S. (2007-10-10), "M51.9", Unconfirmed minutes of WG 2 meeting 51 Hanzhou, China; 2007-04-24/27
L2/07-208 N3285 Proposal to replace 11 KP source references to existing ISO/IEC 10646:2003, 2007-07-18
L2/08-234 N1406 Cook, Richard; Bishop, Thomas; Lunde, Ken (2008-06-06), Han Unification Issues
L2/08-310 Cook, Richard (2008-08-12), Fonts for Extension B and C and IRG
L2/10-215 Lunde, Ken (2010-06-22), "Hanyo-Denshi" IVD Collection (PRI 167) to Adobe-Japan1-6 Mapping Table
N3903 (pdf, doc) "M57.07 (CJK Ext. B glyphs from 2nd edition)", Unconfirmed minutes of WG2 meeting 57, 2011-03-31
L2/11-243 N4111 Sources for Orphaned CJK Ideographs, 2011-06-14
L2/11-254 Constable, Peter (2011-06-20), "Update to UTR #45 U-Source Ideographs requested", UTC Liaison Report from WG2
N4103 "Resolution 58.05", Unconfirmed minutes of WG 2 meeting 58, 2012-01-03
L2/14-260 N4621 Suignard, Michel (2014-10-23), CJK chart and source references update
L2/16-052 N4603 (pdf, doc) Umamaheswaran, V. S. (2015-09-01), "M63.05", Unconfirmed minutes of WG 2 meeting 63
L2/17-180 N2202 Chan, Eiso (2017-06-02), Request for consideration to add kIRG_GSource values to thirteen ideographs and change two G-source glyphs for the Table of General Standard Chinese Characters [Affects 20164]
L2/17-362 Moore, Lisa (2018-02-02), "Consensus 153-C16", UTC #153 Minutes
N4974 N2301 Request of TCA’s Horizontal Extension for Chemical Terminology [Affects U+20BBF, U+20C02, U+20CED, U+26B4C, U+26CBE, U+26E3D, U+28834, U+289A1, U+289C0, U+28A0F, and U+28B46], 2018-06-12
N4987 Proposal on China’s Horizontal Extension for 14 CJK Ideographs [Affects U+37C3, 3FE0, 9FD4, 20164, 24A7D, 25ED7, 2677C, 26C21, 2A917, 2AA30, 2BD77, 2C494, 2C72F, and 2CB38], 2018-06-13
N4988 Proposal on Updating 11 G glyphs of CJK Unified Ideographs to ISO/IEC 10646 [Affects U+3B9D, 3CFD, 4A76, 6FF9, 809E, 891D, 21D4C, 2278B, 23AB8, 2459B, and 2A8FB], 2018-06-13
N2336 Modify the G glyph for U+23517, 2018-09-10
N5016 N2349 Shin, Sanghyun; Cho, Sungduk; Pyo, Seungju; Kim, Kyongsok (2018-12-13), Request to move character K6-1022 in Horizontal Extension of KS X 1027-5 from U+3EAC to U+248F2
N5020 (pdf, doc) Umamaheswaran, V. S. (2019-01-11), "10.4.6, 10.4.8, and 10.4.9", Unconfirmed minutes of WG 2 meeting 67
N2369 Chan, Eiso (2019-05-06), Feedback on IRGN2369 [Affects U+20219 U+21249, U+21827, U+22C3A, U+2327B, U+2363B, U+23839, U+23FD5, U+24261, U+2548E, and U+26C9E]
N5086 N2379 Proposal of China’s horizontal extension for technical used characters [Affects U+23496, U+2355E, U+236ED, U+24726, U+26FE1, U+27334, and U+2A38C], 2019-05-10
L2/19-237 N5068 Editorial Report on Miscellaneous Issues (meeting IRG#52) [Affects U+23517, U+248F2, and U+26657], 2019-05-17
L2/19-244 N5107 TCA's UNC Proposal for WG2 submission [Affects U+27C0E], 2019-05-24
L2/19-241 N5083 N2391 Errata report for WG2 submission_TCA [Affects U+26657], 2019-05-31
N5082 N2391 Updated G Font of U+23517, 2019-05-31
13.0 U+2A6D7..2A6DD 7 L2/17-087 Chan, Eiso; Wang, Xiaolei; Le, Hou; You, Jerry (2017-04-03), Proposal to encode characters for Gongche Notation
L2/17-103 Moore, Lisa (2017-05-18), "E.5", UTC #151 Minutes
N2299 Chan, Eiso (2018-04-22), Request to discuss how to handle seven unencoded Gongche characters for Kunqu Opera
L2/18-245 N4967 Chan, Eiso; You, Jerry; Wang, Xiaolei; Le, Hou (2018-06-01), Updated proposal on Gongche characters for Kunqu Opera
L2/18-241 Anderson, Deborah; et al. (2018-07-25), "17", Recommendations to UTC # 156 July 2018 on Script Proposals
L2/18-183 Moore, Lisa (2018-11-20), "B.4.1", UTC #156 Minutes
N5020 (pdf, doc) Umamaheswaran, V. S. (2019-01-11), "10.2.3", Unconfirmed minutes of WG 2 meeting 67
N5122 "M68.01", Unconfirmed minutes of WG 2 meeting 68, 2019-12-31
L2/19-243 N5106 Suignard, Michel (2019-06-20), "Gongche", Disposition of comments on ISO/IEC CD.2 10646 6th edition
L2/19-270 Moore, Lisa (2019-08-02), "Consensus 160-C9", UTC #160 Minutes
  1. ^ Proposed code points and characters names may differ from final code points and names

See also

References

  1. ^ "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. ^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. ^ "Unicode Character Database: Standardized Variation Sequences". The Unicode Consortium.
  4. ^ "Ideographic Variation Database". Unicode Consortium.
  5. ^ "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium.
  6. ^ Eiso Chan (陈永聪), Comments on four error glyphs on CJK Unified Ideographs Ext B & E.[1]
  7. ^ "unifiable glyph variants" (PDF). Archived from the original (PDF) on 2006-05-15. Retrieved 2017-12-01.
  8. ^ Cook, Richard (6 October 2003). "Defect Report on Duplicate Encoded CJK Forms" (PDF). ISO/IEC JTC1/SC2/WG2. Retrieved 2012-03-28.