ISO/IEC 646
From Wikipedia, the free encyclopedia
|
|
This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (September 2008) |
ISO/IEC 646:1991, Information technology — ISO 7-bit coded character set for information interchange, is an ISO standard that since its first edition in 1972 has specified a 7-bit character code from which several national standards are derived.
Since the portion of ISO/IEC 646 shared by all countries (the "invariant set") specified only those letters used in the basic modern Latin alphabet, other countries using the Latin alphabet with extensions needed to create national variants of ISO 646 to be able to use their native languages. Since universal acceptance of the 8 bit byte did not exist at that time, the national characters had to be made to fit within the constraints of 7 bits, meaning that some characters that appear in ASCII do not appear in other national variants of ISO 646.
ISO/IEC 646 was also ratified by ECMA as ECMA-6.
Contents |
[edit] History
ISO/IEC 646 and its predecessor ASCII (ANSI X3.4) largely endorses existing practice regarding character encodings in the telecommunications industry.
During the 1960s, there was debate about whether character encoding standards (at either the national or international levels) for computers should follow 1) existing practice in the telecommunications industry (which was largely paper-tape based, but which was commonly transmitted on-line digitally over wires), or conversely, 2) existing practice in the punched-card portion of the computer industry, whose heritage was especially the off-line storage of World War II-era electro-mechanical punched-card machines predating electronic computers. For corporate-history reasons regarding Hollerith punched cards, IBM sided with the punched-card character encodings, embodied by EBCDIC, whereas many other computer manufacturers sided with the telecommunications industry's character encodings.
Due to the incompatibility of the various national variants, an International Reference Version (IRV) of ISO/IEC 646 was introduced. The original version (ISO 646 IRV) differed from ASCII only in that in code point 0024, ASCII's dollar sign ($) was replaced by the international currency symbol (¤). The final 1991 version of the code is identical to ASCII.[1]
The ISO 8859 series of standards governing 8-bit character encodings supersede the ISO 646 international standard and its national variants. The ISO 10646 standard, directly related to Unicode, supersedes all of the ISO 646 and ISO 8859 sets of national-variant character encodings with arguably one unified set of character encodings.
[edit] Codepage layout
The following table shows the ISO/IEC 646 character set. Each character is shown with its decimal code and its Unicode equivalent. Grey shaded cells indicate code points with character glyphs that vary from region to region. These are discussed in detail below.
| ISO/IEC 646 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| —0 | —1 | —2 | —3 | —4 | —5 | —6 | —7 | —8 | —9 | —A | —B | —C | —D | —E | —F | |
| 0− |
NUL 0000 0 000 |
SOH 0001 1 001 |
STX 0002 2 002 |
ETX 0003 3 003 |
EOT 0004 4 004 |
ENQ 0005 5 005 |
ACK 0006 6 006 |
BEL 0007 7 007 |
BS 0008 8 010 |
HT 0009 9 011 |
LF 000A 10 012 |
VT 000B 11 013 |
FF 000C 12 014 |
CR 000D 13 015 |
SO 000E 14 016 |
SI 000F 15 017 |
| 1− |
DLE 0010 16 020 |
DC1 0011 17 021 |
DC2 0012 18 022 |
DC3 0013 19 023 |
DC4 0014 20 024 |
NAK 0015 21 025 |
SYN 0016 22 026 |
ETB 0017 23 027 |
CAN 0018 24 030 |
EM 0019 25 031 |
SUB 001A 26 032 |
ESC 001B 27 033 |
FS 001C 28 034 |
GS 001D 29 035 |
RS 001E 30 036 |
US 001F 31 037 |
| 2− |
SP 0020 32 040 |
! 0021 33 041 |
" 0022 34 042 |
0023 35 043 |
0024 36 044 |
% 0025 37 045 |
& 0026 38 046 |
' 0027 39 047 |
( 0028 40 050 |
) 0029 41 051 |
* 002A 42 052 |
+ 002B 43 053 |
, 002C 44 054 |
- 002D 45 055 |
. 002E 46 056 |
/ 002F 47 057 |
| 3− |
0 0030 48 060 |
1 0031 49 061 |
2 0032 50 062 |
3 0033 51 063 |
4 0034 52 064 |
5 0035 53 065 |
6 0036 54 066 |
7 0037 55 067 |
8 0038 56 070 |
9 0039 57 071 |
: 003A 58 072 |
; 003B 59 073 |
< 003C 60 074 |
= 003D 61 075 |
> 003E 62 076 |
? 003F 63 077 |
| 4− |
0040 64 100 |
A 0041 65 101 |
B 0042 66 102 |
C 0043 67 103 |
D 0044 68 104 |
E 0045 69 105 |
F 0046 70 106 |
G 0047 71 107 |
H 0048 72 110 |
I 0049 73 111 |
J 004A 74 112 |
K 004B 75 113 |
L 004C 76 114 |
M 004D 77 115 |
N 004E 78 116 |
O 004F 79 117 |
| 5− |
P 0050 80 120 |
Q 0051 81 121 |
R 0052 82 122 |
S 0053 83 123 |
T 0054 84 124 |
U 0055 85 125 |
V 0056 86 126 |
W 0057 87 127 |
X 0058 88 130 |
Y 0059 89 131 |
Z 005A 90 132 |
005B 91 133 |
005C 92 134 |
005D 93 135 |
005E 94 136 |
_ 005F 95 137 |
| 6− |
0060 96 140 |
a 0061 97 141 |
b 0062 98 142 |
c 0063 99 143 |
d 0064 100 144 |
e 0065 101 145 |
f 0066 102 146 |
g 0067 103 147 |
h 0068 104 150 |
i 0069 105 151 |
j 006A 106 152 |
k 006B 107 153 |
l 006C 108 154 |
m 006D 109 155 |
n 006E 110 156 |
o 006F 111 157 |
| 7− |
p 0070 112 160 |
q 0071 113 161 |
r 0072 114 162 |
s 0073 115 163 |
t 0074 116 164 |
u 0075 117 165 |
v 0076 118 166 |
w 0077 119 167 |
x 0078 120 170 |
y 0079 121 171 |
z 007A 122 172 |
007B 123 173 |
007C 124 174 |
007D 125 175 |
007E 126 176 |
DEL 007F 127 177 |
[edit] National variants
Some national variants of ISO 646 are:
|
|
Other proprietary standards approved later for international use by some standard committees:
|
|
The specifics of the changes for some of these variants are given in this table:
| Codes | Characters for each ISO 646 compatible charset | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| binary | decimal | hexa | INV | T.61 | US | JA | JA-O | KR | CN | TW | IRV | GB | DK | NO | NO-2 | SE | SE-C | DE | HU | FR | FR-0 | CA-1 | CA-2 | IE | IS | ita | por | PT | esp | ES | CU | MT | YU |
| 010 0010 | 34 | 22 | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " |
| 010 0011 | 35 | 23 | # | # | # | # | # | # | # | # | £ | # | # | § | # | # | # | # | £ | £ | # | # | £ | # | £ | # | £ | # | # | # | # | # | |
| 010 0100 | 36 | 24 | ¤ | $ | $ | $ | $ | ¥ | $ | $ | $ | $ | $ | $ | ¤ | ¤ | $ | ¤ | $ | $ | $ | $ | $ | $ | $ | $ | $ | $ | $ | ¤ | $ | $ | |
| 010 1001 | 39 | 27 | ' | ' | ' | ' | ' | ' | ' | ' | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ' | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ |
| 010 1100 | 44 | 2C | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , |
| 010 1101 | 45 | 2D | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| 010 1111 | 47 | 2F | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
| 100 0000 | 64 | 40 | @ | @ | @ | @ | @ | @ | @ | @ | @ | @ | @ | @ | @ | É | § | Á | à | à | à | à | Ó | Ð | § | § | ´ | § | · | @ | @ | Ž | |
| 101 1011 | 91 | 5B | [ | [ | [ | [ | [ | [ | [ | [ | [ | Æ | Æ | Æ | Ä | Ä | Ä | É | ° | ° | â | â | É | Þ | ° | Ã | Ã | ¡ | ¡ | ¡ | ġ | Š | |
| 101 1100 | 92 | 5C | \ | ¥ | ¥ | ₩ | \ | \ | \ | \ | Ø | Ø | Ø | Ö | Ö | Ö | Ö | ç | ç | ç | ç | Í | \ | ç | Ç | Ç | Ñ | Ñ | Ñ | ż | Đ | ||
| 101 1101 | 93 | 5D | ] | ] | ] | ] | ] | ] | ] | ] | ] | Å | Å | Å | Å | Å | Ü | Ü | § | § | ê | ê | Ú | Æ | é | Õ | Õ | ¿ | Ç | ] | ħ | Ć | |
| 101 1110 | 94 | 5E | ^ | ^ | ^ | ^ | ^ | ^ | ˆ | ˆ | ˆ | ˆ | ˆ | ˆ | Ü | ˆ | ˆ | ^ | ˆ | î | É | Á | Ö | ˆ | ˆ | ˆ | ˆ | ¿ | ¿ | ˆ | Č | ||
| 101 1111 | 95 | 5F | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ |
| 110 0000 | 96 | 60 | ` | ` | ` | ` | ` | ` | ` | ` | ` | ` | ` | é | ` | á | µ | µ | ô | ô | ó | ð | ù | ` | ` | ` | ` | ` | ċ | ž | |||
| 111 1011 | 123 | 7B | { | { | { | { | { | { | { | { | æ | æ | æ | ä | ä | ä | é | é | é | é | é | é | þ | à | ã | ã | ° | ´ | ´ | Ġ | š | ||
| 111 1100 | 124 | 7C | | | | | | | | | | | | | | | | | | | ø | ø | ø | ö | ö | ö | ö | ù | ù | ù | ù | í | | | ò | ç | ç | ñ | ñ | ñ | Ż | đ | |
| 111 1101 | 125 | 7D | } | } | } | } | } | } | } | } | å | å | å | å | å | ü | ü | è | è | è | è | ú | æ | è | õ | õ | ç | ç | [ | Ħ | ć | ||
| 111 1110 | 126 | 7E | ~ | ‾ | ‾ | ‾ | ‾ | ˜ | ˜ | ˜ | ¯ | | | ˜ | ü | ß | ˝ | ¨ | ¨ | û | û | á | ö | ì | ° | ˜ | ˜ | ¨ | ¨ | Ċ | č | |||
In the table above, the cells with non-white background emphasize the differences from the US variant used in the Basic Latin subset of ISO/IEC 10646 and Unicode.
The characters displayed in cells with red background could be used as combining diacritics, when preceded or followed with a backspace C0 control (this encoding method is deprecated or is not recommended as it was part of some withdrawn national standards). Without such complex encoding, they are no different from the symbols used in the US variant (although glyph variants are still possible, especially on the quotation marks, and circumflex or tilde symbols).
Later, when 8 bit character sets gained more acceptance, ISO 8859-1, ISO 8859-2, and ISO 8859-3 became the preferred method of coding most of these variants.
[edit] Variants of ASCII that are not ISO 646
There are also some 7-bit character sets that are not officially part of the ISO 646 standard. Examples include:
- 7-bit Greek, ELOT 927. The Greek alphabet is mapped to positions 0x61–0x71 and 0x73–0x79, on top of the Latin lowercase letters.
- 7-bit Cyrillic, KOI-7 or Short KOI. The Cyrillic characters are mapped to positions 0x60–0x7E, on top of the Latin lowercase letters. Superseded by the KOI-8 variants.
- 7-bit Hebrew, SI 960. The Hebrew alphabet is mapped to positions 0x60–0x7A, on top of the lowercase Latin letters (and grave accent for aleph). 7-bit Hebrew was always stored in visual order. This mapping with the high bit set, i.e. with the Hebrew letters in 0xE0–0xFA, is ISO 8859-8.
- 7-bit Arabic, ASMO 449. The Arabic alphabet is mapped to positions 0x41–0x5A and 0x60–0x6A, on top of both uppercase and lowercase Latin letters. This mapping with the high bit set is ISO 8859-6.
[edit] See also
[edit] References
[edit] External links
- Zeichensatz nach ISO 646 (ASCII) (in German)
- History at GNU Aspell website
- Character Tables by Koichi Yasuoka (see Domestic ISO646 Character Tables and Quasi-ISO646 Character Tables)
- Turkish Text Deasciifier a tool (based on statistical pentagram analysis of the Turkish language) which reverts an ASCII'fied Turkish text by determining the appropriate (but ambiguous) diacritics normally needed in Turkish but missing in the US-ASCII set.
|
|||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||