Half-width kana

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Random832 (talk | contribs) at 13:11, 24 March 2009 (→‎Half-width table). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Half-width kana (半角カナ) is half of fullwidth form. It refers to the katakana character portion of the character set specified by JIS X 0201.

Although an official name is JIS X 0201 katakana, half-width kana is the commonly known name and this term will be used in this article.

History

ASCII is defined as a 7-bit character set and has room for 128 characters. However, since this standard was designed for the United States, it does not contain characters and symbols (for example, the ¥ yen currency symbol) needed for representation of Japanese.

JIS X 0201 was developed in 1969, and since computers at that time simply did not have the computational power and memory necessary to process the thousands of Kanji (Chinese-based) characters that exist in written Japanese, therefore as a simplification, Kanji characters were always represented by katakana.

Half-width kana were developed as "...the first Japanese characters encoded on computers because they are used for Japanese telegrams. As single-byte characters..." [1]

To make katakana fit into the area allowed, some compromises were made: the diacritical marks Dakuten and Handakuten are treated as separate characters instead of being part of the preceding character. This led to the so-called "half-width kana" and these compromises still cause problems today for computer programs, apart from frequently being considered to be visually unattractive.

Half-width table

"J" indicates the first four bits in JIS X 0201 (though see below, these do not necessarily indicate half-width) and in other sets such as CP932, "U" indicates the row in Unicode.

J U 0 1 2 3 4 5 6 7 8 9 A B C D E F
A FF6  
B FF7 ソ
C FF8
D FF9

Half-width kana on the Internet

E-mail

Since the SMTP and NNTP protocols (used to deliver e-mail and Usenet, respectively) were formerly only able to transmit 7-bits, it was then the convention to use ISO-2022-JP for sending e-mail in Japanese.

Since half-width kana is not contained in ISO-2022-JP, half-width kana cannot be included in a message, but when half-width kana was accidentally included in a message, it can become garbled during transmission.

This is no longer such a problem since most e-mail servers today use ESMTP, and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using MIME.

Web pages

The problems that exists in e-mail do not exist with Web pages since HTTP accepts 8-bit characters.

A problem that does exist is that computer programs have difficulties whether to treat a character as Shift JIS,EUC-JP, or UTF-7 - hence character code information should be specified with a HTTP response header or a Meta tag.

Misunderstanding of JIS X 0201

In fact, JIS X 0201 katakana is not half-width katakana. The standard doesn't define character's width. It defines only the code representation of katakana characters. The term "half-width" is just the remains of the old devices that displayed single-byte characters in half-width (as compared with double-byte ones). In JIS X 0201 standard, katakana characters in its code chart are printed in normal width, not half-width.

However, the misunderstanding that the standard defines "half-width" characters is widespread. People who know the standard will often say "so-called half-width kana."

See also

References

  1. ^ Lunde, Ken. CJKV Information Processing. 1st ed. O'Reilly, 1999. p. 144-145