Jump to content

Half-width kana: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Azukimonaka (talk | contribs)
Line 350: Line 350:
==Half-width kana on the Internet==
==Half-width kana on the Internet==
===Email===
===Email===
Since [[NNTP]] and [[SMTP]], protocols used to deliver email and [[Usenet]], respectively, used to only be able to transmit 7-bits, so it was convention to use [[ISO-2022-JP]] for sending email in Japanese.
Since [[NNTP]] and [[SMTP]], protocols used to deliver email and [[Usenet]], respectively, used to only be able to transmit 7-bits, so it was the convention to use [[ISO-2022-JP]] for sending e-mail in Japanese.


Since half-width kana is not contained in ISO-2022-JP, half-width kana cannot be included in a message, but when half-width kana was accidentally included in a message, it can become garbled during transmission.
Since half-width kana is not contained in ISO-2022-JP, half-width kana cannot be included in a message, but when half-width kana was accidentally included in a message, it can become garbled during transmission.


This is no longer such a problem since most mail servers today use [[ESMTP]] and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using [[MIME]].
This is no longer such a problem since most e-mail servers today use [[ESMTP]], and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using [[MIME]].


===Web pages===
===Web pages===

Revision as of 20:54, 2 March 2008

Half-width kana (半角カナ) is half of fullwidth form. It refers to the katakana character portion of the character set specified by JIS X 0201.

Although an official name is JIS X 0201 katakana, half-width kana is the commonly known name and this term will be used in this article.

History

ASCII is defined as a 7-bit character set and has room for 128 characters. However, since this standard was designed for the United States, it is Americentric in nature and does not contain characters and symbols (such as the ¥ yen currency symbol) needed for representation of Japanese.

JIS X 0201 was developed in 1969 and computers at that time simply did not have the computational power necessary to process the thousands of Kanji characters that exists in Japanese, so as a compromise, Kanji was represented by katakana.

Half-width kana were developed as "...the first Japanese characters encoded on computers because they are used for Japanese telegrams. As single-byte characters..." [1]

To make katakana fit into the area allowed, some compromises were made: the diacritical marks Dakuten and Handakuten are treated as separate characters instead of being part of the preceding character. This led to the so-called "half-width kana" and these compromises still cause problems today for computer programs, apart from being visually unattractive.

Half-width table

\Trailing 4 bits→
↓Leading 4 bits
0 1 2 3 4 5 6 7 8 9 a b c d e f
0                                
1                                
2                                
3                                
4                                
5                                
6                                
7                                
8                                
9                                
a  
b ソ
c
d
e                                
f                                

Half-width kana on the Internet

Email

Since NNTP and SMTP, protocols used to deliver email and Usenet, respectively, used to only be able to transmit 7-bits, so it was the convention to use ISO-2022-JP for sending e-mail in Japanese.

Since half-width kana is not contained in ISO-2022-JP, half-width kana cannot be included in a message, but when half-width kana was accidentally included in a message, it can become garbled during transmission.

This is no longer such a problem since most e-mail servers today use ESMTP, and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using MIME.

Web pages

The problems that exists in email do not exist with webpages since HTTP accepts 8-bit characters.

The only problem that does exist is that computer programs have difficulties whether to treat a character as Shift JIS,EUC-JP or UTF-7, hence character code information should be specified with a HTTP response header or a Meta tag.

Misunderstanding of JIS X 0201

In fact, JIS X 0201 katakana is not half-width katakana. The standard doesn't define character's width. It defines only the code representation of katakana characters. The term "half-width" is just the remains of the old devices that displayed single-byte characters in half-width (as compared with double-byte ones). In JIS X 0201 standard, katakana characters in its code chart are printed in normal width, not half-width.

However, the misunderstanding that the standard defines "half-width" characters is widespread. People who know the standard will often say "so-called half-width kana."

See also

References

  1. ^ Lunde, Ken. CJKV Information Processing. 1st ed. O'Reilly, 1999. p. 144-145