ISO/IEC 646
ISO 646 is an ISO standard that since 1972 has specified a 7-bit character code from which several national standards are derived.
Since the portion of ISO 646 shared by all countries (the "invariant set") specified only those letters used in the basic modern Latin alphabet, other countries using the Latin alphabet with extensions needed to create national variants of ISO 646 to be able to use their native languages. Since universal acceptance of the 8 bit byte did not exist at that time, the national characters had to be made to fit within the constraints of 7 bits, meaning that some characters that appear in ASCII do not appear in other national variants of ISO 646.
ISO/IEC 646 was also ratified by ECMA as ECMA-6.
History
ISO/IEC 646 and its predecessor ASCII (ANSI X3.4) largely endorses existing practice regarding character encodings in the telecommunications industry.
During the 1960s, there was debate about whether character encoding standards (at either the national or international levels) for computers should follow 1) existing practice in the telecommunications industry (which was largely paper-tape based, but which was commonly transmitted on-line digitally over wires), or conversely, 2) existing practice in the punched-card portion of the computer industry, whose heritage was especially the off-line storage of World War II-era electro-mechanical punched-card machines predating electronic computers. For corporate-history reasons regarding Hollerith punched cards, IBM sided with the punched-card character encodings, embodied by EBCDIC, whereas many other computer manufacturers sided with the telecommunications industry's character encodings.
Due to the incompatibility of the various national variants, an International Reference Version (IRV) of ISO/IEC 646 was introduced. The original version (ISO 646 IRV) differed from ASCII only in that in code point 0024, ASCII's dollar sign ($) was replaced by the international currency symbol (¤). The final 1991 version of the code is identical to ASCII.[1]
The ISO 8859 series of standards governing 8-bit character encodings supersede the ISO 646 international standard and its national variants. The ISO 10646 standard, directly related to Unicode, supersedes all of the ISO 646 and ISO 8859 sets of national-variant character encodings with arguably one unified set of character encodings.
Codepage layout
The following table shows the ISO/IEC 646 character set. Each character is shown with its decimal code and its Unicode equivalent. Grey shaded cells indicate code points with character glyphs that vary from region to region. These are discussed in detail below.
National variants
Some national variants of ISO 646 are:
|
|
Other proprietary standards approved later for international use by some standard committees:
|
|
The specifics of the changes for some of these variants are given in this table:
Codes | Characters for each ISO 646 compatible charset | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
binary | decimal | hexa | INV | US | T.61 | JA | JA-O | KR | CN | IRV | GB | DK | NO | NO-2 | SE | SE-C | DE | HU | FR | FR-0 | CA-1 | CA-2 | IE | IS | ita | por | PT | esp | ES | CU | MT | YU |
010 0010 | 34 | 22 | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " | " |
010 0011 | 35 | 23 | # | # | # | # | # | # | # | £ | # | # | § | # | # | # | # | £ | £ | # | # | £ | # | £ | # | £ | # | # | # | # | # | |
010 0100 | 36 | 24 | $ | ¤ | $ | $ | $ | ¥ | $ | $ | $ | $ | $ | ¤ | ¤ | $ | ¤ | $ | $ | $ | $ | $ | $ | $ | $ | $ | $ | $ | ¤ | $ | $ | |
010 1001 | 39 | 27 | ' | ' | ' | ' | ' | ' | ' | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ' | ’ | ’ | ’ | ’ | ’ | ’ | ’ | ’ |
010 1100 | 44 | 2C | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , | , |
010 1101 | 45 | 2D | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
010 1111 | 47 | 2F | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
100 0000 | 64 | 40 | @ | @ | @ | @ | @ | @ | @ | @ | @ | @ | @ | @ | É | § | Á | à | à | à | à | Ó | Ð | § | § | ´ | § | · | @ | @ | Ž | |
101 1011 | 91 | 5B | [ | [ | [ | [ | [ | [ | [ | [ | Æ | Æ | Æ | Ä | Ä | Ä | É | ° | ° | â | â | É | Þ | ° | Ã | Ã | ¡ | ¡ | ¡ | ġ | Š | |
101 1100 | 92 | 5C | \ | ¥ | ¥ | ₩ | \ | \ | \ | Ø | Ø | Ø | Ö | Ö | Ö | Ö | ç | ç | ç | ç | Í | \ | ç | Ç | Ç | Ñ | Ñ | Ñ | ż | Đ | ||
101 1101 | 93 | 5D | ] | ] | ] | ] | ] | ] | ] | ] | Å | Å | Å | Å | Å | Ü | Ü | § | § | ê | ê | Ú | Æ | é | Õ | Õ | ¿ | Ç | ] | ħ | Ć | |
101 1110 | 94 | 5E | ^ | ^ | ^ | ^ | ^ | ˆ | ˆ | ˆ | ˆ | ˆ | ˆ | Ü | ˆ | ˆ | ^ | ˆ | î | É | Á | Ö | ˆ | ˆ | ˆ | ˆ | ¿ | ¿ | ˆ | Č | ||
101 1111 | 95 | 5F | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ |
110 0000 | 96 | 60 | ` | ` | ` | ` | ` | ` | ` | ` | ` | ` | é | ` | á | µ | µ | ô | ô | ó | ð | ù | ` | ` | ` | ` | ` | ċ | ž | |||
111 1011 | 123 | 7B | { | { | { | { | { | { | { | æ | æ | æ | ä | ä | ä | é | é | é | é | é | é | þ | à | ã | ã | ° | ´ | ´ | Ġ | š | ||
111 1100 | 124 | 7C | | | | | | | | | | | | | | | | | ø | ø | ø | ö | ö | ö | ö | ù | ù | ù | ù | í | | | ò | ç | ç | ñ | ñ | ñ | Ż | đ | |
111 1101 | 125 | 7D | } | } | } | } | } | } | } | å | å | å | å | å | ü | ü | è | è | è | è | ú | æ | è | õ | õ | ç | ç | [ | Ħ | ć | ||
111 1110 | 126 | 7E | ~ | ‾ | ‾ | ~ | ˜ | ˜ | ˜ | ¯ | | | ˜ | ü | ß | ˝ | ¨ | ¨ | û | û | á | ö | ì | ° | ˜ | ˜ | ¨ | ¨ | Ċ | č |
In the table above, the cells with non-white background emphasize the differences from the US variant used in the Basic Latin subset of ISO/IEC 10646 and Unicode.
The characters displayed in cells with red background could be used as combining diacritics, when preceded or followed with a backspace C0 control (this encoding method is deprecated or is not recommended as it was part of some withdrawn national standards). Without such complex encoding, they are no different from the symbols used in the US variant (although glyph variants are still possible, especially on the quotation marks, and circumflex or tilde symbols).
Later, when 8 bit character sets gained more acceptance, ISO 8859-1, ISO 8859-2, and ISO 8859-3 became the preferred method of coding most of these variants.
Variants of ASCII that are not ISO 646
This article appears to contradict the article ISO 8859-7. |
There are also some 7-bit character sets that are not officially part of the ISO 646 standard. Examples include:
- 7-bit Greek, ELOT 927. The Greek alphabet is mapped to positions 0x61–0x71 and 0x73–0x79, on top of the Latin lowercase letters. This mapping with the high bit set is ISO 8859-7.
- 7-bit Cyrillic, KOI-7 or Short KOI. The Cyrillic characters are mapped to positions 0x60–0x7E, on top of the Latin lowercase letters. Superseded by the KOI-8 variants.
- 7-bit Hebrew, SI 960. The Hebrew alphabet is mapped to positions 0x60–0x7A, on top of the lowercase Latin letters (and grave accent for aleph). 7-bit Hebrew was always stored in visual order. This mapping with the high bit set, i.e. with the Hebrew letters in 0xE0–0xFA, is ISO 8859-8.
- 7-bit Arabic, ASMO 449. The Arabic alphabet is mapped to positions 0x41–0x5A and 0x60–0x6A, on top of both uppercase and lowercase Latin letters. This mapping with the high bit set is ISO 8859-6.
See also
References
External links
- Zeichensatz nach ISO 646 (ASCII) (in German)
- History at GNU Aspell website
- Character Tables by Koichi Yasuoka (see Domestic ISO646 Character Tables and Quasi-ISO646 Character Tables)
- Turkish Text Deasciifier a tool (based on statistical pentagram analysis of the Turkish language) which reverts an ASCII'fied Turkish text by determining the appropriate (but ambiguous) diacritics normally needed in Turkish but missing in the US-ASCII set.