Half-width kana

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Half-width kana (半角カナ) refers to a katakana character set specified by JIS X 0201. Though it is officially part of JIS x 0201, and not a separate code set, the characters are more commonly referred to as "half-width kana."

Contents

[edit] History

ASCII is defined as a 7-bit character set and has room for 128 characters. However, since this standard was designed for the United States, it does not contain characters and symbols, such as the yen (¥) symbol needed to represent Japanese currency.

JIS X 0201 was developed in 1969, a time when computers were generally incapable, both by software design and hardware resources, of representing the thousands of Chinese-based kanji characters used in Japanese. As a compromise, half-width kana was created as a small set of characters, assigned in the upper byte value range of 0x80 - 0xFF. This allowed 8-bit processors to process Japanese text, displaying it phonetically by katakana in an unorthodox, narrower form factor to fit the same width as the monospaced latin alphabets machines were capable of printing and displaying.

Half-width kana were developed as "...the first Japanese characters encoded on computers because they are used for Japanese telegrams. As single-byte characters..." [1]

To make katakana fit into the area allowed, some compromises were made. For example, the diacritical marks dakuten and handakuten are treated as separate characters instead of being part of the preceding character. This compromise led many to consider "half-width kana" visually unattractive, and causes problems for many computer programs today.[citation needed]

[edit] Half-width table

"J" indicates the first four bits in JIS X 0201 (though see below, these do not necessarily indicate half-width) and in other sets such as Shift JIS, "U" indicates the row in Unicode.

J U 0 1 2 3 4 5 6 7 8 9 A B C D E F
A FF6  
B FF7 ソ
C FF8
D FF9

[edit] Half-width kana on the Internet

[edit] E-mail

Since the SMTP and NNTP protocols (used to deliver e-mail and Usenet, respectively) were formerly only able to transmit 7-bits, it was then the convention to use ISO-2022-JP for sending e-mail in Japanese.

Since half-width kana is not contained in ISO-2022-JP, half-width kana could not be included in a message; if half-width kana were accidentally included in a message, it could become garbled during transmission.

This is no longer such a problem since most e-mail servers today use ESMTP, and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using MIME.

[edit] Web pages

The problems that exists in e-mail does not exist with Web pages since HTTP accepts 8-bit characters.

A problem that does exist is that computer programs have difficulties whether to treat a character as Shift JIS,EUC-JP, or UTF-7 - hence character code information should be specified with a HTTP response header or a Meta tag.

[edit] Misunderstanding of JIS X 0201

In fact, JIS X 0201 katakana are not half-width katakana. The standard does not define character widths. It defines only the code representation of katakana characters. The entire use of "half-width" is a vestigial trace of the older devices that displayed single-byte characters half-width (in contrast to double-byte characters). In the JIS X 0201 standard, katakana characters are printed in normal (full) width, not half-width.

However, the misunderstanding that the standard defines "half-width" characters is widespread. People who know the standard[who?] might say "so-called half-width kana."

[edit] See also

[edit] References

  1. ^  Lunde, Ken. CJKV Information Processing. 1st ed. O'Reilly, 1999. p. 144-145
Languages