Code page 932 (Microsoft Windows)

From Wikipedia, the free encyclopedia
  (Redirected from Windows-932)
Jump to: navigation, search

Microsoft Windows code page 932 (Windows-932 or ambiguously CP932), known by the IANA as Windows-31J,[1] also called MS-Kanji,[2] is Microsoft's extended variant of Shift JIS. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.

IBM offer the same extended double-byte codes in their code page 943 (IBM-943 or CP943),[3] which is a combination of Code page 897 and Code page 941.[4]

The "Windows-31J" name is IANA's and not recognized by Microsoft, which has historically used "shift_jis" instead. In Japanese editions of Windows, this code page is referred to as "ANSI", since it is the operating system's default 8-bit encoding, even though ANSI was not involved in its definition.

Differences from standard Shift JIS[edit]

Windows-31J is often mistaken for standard Shift JIS: while similar, the distinction is significant for computer programmers wishing to avoid mojibake.

Double-byte character differences[edit]

In addition to the standard JIS X 0201:1997 and JIS X 0208:1997 characters, Windows-31J includes several JIS X 0208 extensions, namely "NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)".[1] This also differs from IBM-932, which does not include the NEC extensions or NEC selection.[3]

Such "formerly proprietary extensions from IBM and NEC", while not part of the JIS standards, are included in the W3C/WHATWG encoding standard used by HTML5,[5] which also treats the label "shift_jis" interchangeably with "windows-31j" with the intent of being "compatible with deployed content".[6]

Some of these rows were subsequently used differently by JIS X 0213. For example, compare row 89 in JIS X 0213 (beginning 硃, 硎, 硏…)[7] to row 89 as used by JIS X 0208 with IBM/NEC extensions (beginning 纊, 褜, 鍈…).[8] Consequently, Shift_JISx0213 is not compatible with Windows-31J.

Single-byte character differences[edit]

Windows-932 includes standard 7-bit ASCII codes for single-byte sequences with the high bit set to 0. Hence, codes 0x5C and 0x7E are mapped to Unicode as U+005C REVERSE SOLIDUS (\) and U+007E TILDE (~) respectively,[9][10] as they are in ASCII (ISO-646-US). However, 0x5C is mapped to U+00A5 YEN SIGN (¥) in ISO-646-JP and consequently JIS X 0201, of which standard Shift JIS is an extension.

Adding to the confusion, in many Japanese fonts, U+005C is displayed as a Yen symbol, which would normally be represented as U+00A5, rather than as a backslash per Unicode's suggested rendering. However, code 0x5C in Windows-932 behaves as a reverse solidus (backslash) in all respects (e.g. in file paths on Windows systems) other than how it is displayed by some fonts.

IBM-943, however, like IBM-932,[3] is a superset of IBM-897,[4] which assigns 0x5C to the Yen symbol (¥) and 0x7E to the overline (¯),[11] as in JIS X 0201.

See also[edit]

References[edit]

  1. ^ a b "Character Sets". IANA. 
  2. ^ "7.2.3. Standard Encodings". Python 3.6 Documentation. Python Software Foundation. Retrieved 19 September 2017. 
  3. ^ a b c "IBM-943 and IBM-932". IBM Knowledge Center. IBM. 
  4. ^ a b "Code Page 943". IBM. 
  5. ^ "5. Indexes (§ Index jis0208)". Encoding Standard. WHATWG. 
  6. ^ "4.2. Names and labels". Encoding Standard. WHATWG. 
  7. ^ "233: Japanese Graphic Character Set for Information Interchange, Plane 1" (PDF). IPSJ. 
  8. ^ "Index jis0208 visualization". Encoding Standard. WHATWG. 
  9. ^ "CP932.TXT". Unicode Consortium. 
  10. ^ "Lead byte NULL — Code page 932". Microsoft. 
  11. ^ "CP00897.txt". IBM. 

External links[edit]