Jump to content

ISO/IEC 8859-1: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m rm "0x" hex prefix for consistency
rm apparent duplicate
Line 3: Line 3:
'''ISO/IEC 8859-1:1998''', ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the [[ISO/IEC 8859]] series of ASCII-based standard [[character encoding]]s, first edition published in 1987. It is informally referred to as '''Latin-1'''. It is generally intended for “[[Western European]]” languages (see below for a list).
'''ISO/IEC 8859-1:1998''', ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the [[ISO/IEC 8859]] series of ASCII-based standard [[character encoding]]s, first edition published in 1987. It is informally referred to as '''Latin-1'''. It is generally intended for “[[Western European]]” languages (see below for a list).


'''ISO-8859-1''' is the [[Internet Assigned Numbers Authority|IANA]] preferred charset name for this standard when supplemented with the [[C0 and C1 control codes]] from [[ISO/IEC 6429]]. The following other aliases are registered for ISO-8859-1: '''ISO_8859-1''', '''ISO-8859-1''', '''iso-ir-100''', '''csISOLatin1''', '''latin1''', '''l1''', '''IBM819''', '''CP819'''.
'''ISO-8859-1''' is the [[Internet Assigned Numbers Authority|IANA]] preferred charset name for this standard when supplemented with the [[C0 and C1 control codes]] from [[ISO/IEC 6429]]. The following other aliases are registered for ISO-8859-1: '''ISO_8859-1''', '''iso-ir-100''', '''csISOLatin1''', '''latin1''', '''l1''', '''IBM819''', '''CP819'''.


The [[Windows-1252]] codepage coincides with ISO-8859-1 for all codes except the range 80 to 9F (where the little-used C1 controls are replaced with additional characters).
The [[Windows-1252]] codepage coincides with ISO-8859-1 for all codes except the range 80 to 9F (where the little-used C1 controls are replaced with additional characters).

Revision as of 22:49, 6 October 2010

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin-1. It is generally intended for “Western European” languages (see below for a list).

ISO-8859-1 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The following other aliases are registered for ISO-8859-1: ISO_8859-1, iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819.

The Windows-1252 codepage coincides with ISO-8859-1 for all codes except the range 80 to 9F (where the little-used C1 controls are replaced with additional characters).

Coverage

ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout The Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages.

Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):

Modern languages with complete coverage of their alphabet
Languages commonly supported with nearly complete coverage of their alphabet
  • Danish - Ǿ and ǿ are missing (but these can always be replaced with ordinary Ø/ø at the cost of ambiguity)
  • Dutch – missing the typographic ligatures IJ and ij (but these should always be represented as digraphs IJ, ij in electronic form)
  • Estonian and Finnish – missing Š, š, Ž, ž, which can only occur in loanwords (the missing characters are often replaced by digraphs Sh, sh, Zh, zh) (Windows-1252 and ISO-8859-15 do contain these)
  • French – missing the ligatures Œ and œ as well as the very rare Ÿ (they are generally replaced by digraphs OE and oe, and Y without the diaeresis) (Windows-1252 and ISO-8859-15 do contain these)
  • Hungarian – missing Ő, ő, Ű, ű
  • Latin (classical language written using macrons) – missing Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū
  • Welsh – missing Ŵ, ŵ, Ŷ, ŷ (ISO-8859-14 does contain these)
  • Māori – missing Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū
Coverage of punctuation signs and apostrophes

For some languages listed above the correct typographical quotation marks are missing, as only « », " ", and ' ' are included.

Also, this encoding scheme does not provide the correct character for the apostrophe and oriented single high quotation marks, although some texts use the spacing grave accent and spacing acute accent that are both part of ISO 8859-1, instead of the 6-shaped/9-shaped quotations marks or apostrophes (and this works reliably with some font styles where all these characters are displayed as slanted wedge glyphs).

See also: Alphabets derived from the Latin

History

ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popular VT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and published in March 1985 as ECMA-94, by which name it is still sometimes known. The second edition of ECMA-94 (June 1986) also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification.

In 1985 Commodore adopted officially for its new AmigaOS operating system ANSI/ISO8859-1 layout for its codepage and all internal operations in order to refer to international approved standards rather than proprietary standards, as it happened in those times with MS-DOS, and Mac OS and thus this standard was also used for manufacturing the keyboard layout of Amiga 1000 computer that was launched in July 1985. All versions of Amiga OS up to 3.1 used ISO8859-1. Since the demise of Commodore International in 1994 all further versions of AmigaOS (3.5, 3.9) continued to have ISO8859-1 codepage set enhanced with Euro Currency character, but without a leading firm capable to impose official standards both Amiga and its clone variants (MorphOS, AROS) did not update officially to ISO 8859-15 neither follow a common approach in the introduction of Euro character in 2001. MorphOS 2.0 and further versions are UNICODE UTF-8 compliant.

In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the code values 00–1F, 7F, and 80–9F. It thus provides for 256 characters via every possible 8-bit value.

ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/". It is the default encoding of the values of certain descriptive HTTP headers, and is the standard encoding used by the X Window System on most Unix machines in locales which use that character set. It was also the basis of the repertoire of characters allowed in HTML 3.2 documents (HTML 4.0, however, is based on Unicode). However, the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[1]

Codepage layout

ISO/IEC 8859-1
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_ Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef
1_ Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef
2_ Template:Chset-color-punct|SP
0020
32
040
Template:Chset-color-punct|!
0021
33
041
Template:Chset-color-punct|"
0022
34
042
Template:Chset-color-punct|#
0023
35
043
Template:Chset-color-punct|$
0024
36
044
Template:Chset-color-punct|%
0025
37
045
Template:Chset-color-punct|&
0026
38
046
Template:Chset-color-punct|'
0027
39
047
Template:Chset-color-punct|(
0028
40
050
Template:Chset-color-punct|)
0029
41
051
Template:Chset-color-punct|*
002A
42
052
Template:Chset-color-punct|+
002B
43
053
Template:Chset-color-punct|,
002C
44
054
Template:Chset-color-punct|-
002D
45
055
Template:Chset-color-punct|.
002E
46
056
Template:Chset-color-punct|/
002F
47
057
3_ Template:Chset-color-digit|0
0030
48
060
Template:Chset-color-digit|1
0031
49
061
Template:Chset-color-digit|2
0032
50
062
Template:Chset-color-digit|3
0033
51
063
Template:Chset-color-digit|4
0034
52
064
Template:Chset-color-digit|5
0035
53
065
Template:Chset-color-digit|6
0036
54
066
Template:Chset-color-digit|7
0037
55
067
Template:Chset-color-digit|8
0038
56
070
Template:Chset-color-digit|9
0039
57
071
Template:Chset-color-punct|:
003A
58
072
Template:Chset-color-punct|;
003B
59
073
Template:Chset-color-punct|<
003C
60
074
Template:Chset-color-punct|=
003D
61
075
Template:Chset-color-punct|>
003E
62
076
Template:Chset-color-punct|?
003F
63
077
4_ Template:Chset-color-punct|@
0040
64
100
Template:Chset-color-alpha|A
0041
65
101
Template:Chset-color-alpha|B
0042
66
102
Template:Chset-color-alpha|C
0043
67
103
Template:Chset-color-alpha|D
0044
68
104
Template:Chset-color-alpha|E
0045
69
105
Template:Chset-color-alpha|F
0046
70
106
Template:Chset-color-alpha|G
0047
71
107
Template:Chset-color-alpha|H
0048
72
110
Template:Chset-color-alpha|I
0049
73
111
Template:Chset-color-alpha|J
004A
74
112
Template:Chset-color-alpha|K
004B
75
113
Template:Chset-color-alpha|L
004C
76
114
Template:Chset-color-alpha|M
004D
77
115
Template:Chset-color-alpha|N
004E
78
116
Template:Chset-color-alpha|O
004F
79
117
5_ Template:Chset-color-alpha|P
0050
80
120
Template:Chset-color-alpha|Q
0051
81
121
Template:Chset-color-alpha|R
0052
82
122
Template:Chset-color-alpha|S
0053
83
123
Template:Chset-color-alpha|T
0054
84
124
Template:Chset-color-alpha|U
0055
85
125
Template:Chset-color-alpha|V
0056
86
126
Template:Chset-color-alpha|W
0057
87
127
Template:Chset-color-alpha|X
0058
88
130
Template:Chset-color-alpha|Y
0059
89
131
Template:Chset-color-alpha|Z
005A
90
132
Template:Chset-color-punct|[
005B
91
133
Template:Chset-color-punct|\
005C
92
134
Template:Chset-color-punct|]
005D
93
135
Template:Chset-color-punct|^
005E
94
136
Template:Chset-color-punct|_
005F
95
137
6_ Template:Chset-color-punct|`
0060
96
140
Template:Chset-color-alpha|a
0061
97
141
Template:Chset-color-alpha|b
0062
98
142
Template:Chset-color-alpha|c
0063
99
143
Template:Chset-color-alpha|d
0064
100
144
Template:Chset-color-alpha|e
0065
101
145
Template:Chset-color-alpha|f
0066
102
146
Template:Chset-color-alpha|g
0067
103
147
Template:Chset-color-alpha|h
0068
104
150
Template:Chset-color-alpha|i
0069
105
151
Template:Chset-color-alpha|j
006A
106
152
Template:Chset-color-alpha|k
006B
107
153
Template:Chset-color-alpha|l
006C
108
154
Template:Chset-color-alpha|m
006D
109
155
Template:Chset-color-alpha|n
006E
110
156
Template:Chset-color-alpha|o
006F
111
157
7_ Template:Chset-color-alpha|p
0070
112
160
Template:Chset-color-alpha|q
0071
113
161
Template:Chset-color-alpha|r
0072
114
162
Template:Chset-color-alpha|s
0073
115
163
Template:Chset-color-alpha|t
0074
116
164
Template:Chset-color-alpha|u
0075
117
165
Template:Chset-color-alpha|v
0076
118
166
Template:Chset-color-alpha|w
0077
119
167
Template:Chset-color-alpha|x
0078
120
170
Template:Chset-color-alpha|y
0079
121
171
Template:Chset-color-alpha|z
007A
122
172
Template:Chset-color-punct|{
007B
123
173
Template:Chset-color-punct||
007C
124
174
Template:Chset-color-punct|}
007D
125
175
Template:Chset-color-punct|~
007E
126
176
Template:Chset-color-undef
8_ Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef
9_ Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef
A_ Template:Chset-color-ext-punct|NBSP
00A0
160
Template:Chset-color-ext-punct|¡
00A1
161
241
Template:Chset-color-ext-punct|¢
00A2
162
242
Template:Chset-color-ext-punct|£
00A3
163
243
Template:Chset-color-ext-punct|¤
00A4
164
244
Template:Chset-color-ext-punct|¥
00A5
165
245
Template:Chset-color-ext-punct|¦
00A6
166
246
Template:Chset-color-ext-punct|§
00A7
167
247
Template:Chset-color-ext-punct|¨
00A8
168
250
Template:Chset-color-ext-punct|©
00A9
169
251
Template:Chset-color-ext-punct|ª
00AA
170
252
Template:Chset-color-ext-punct|«
00AB
171
253
Template:Chset-color-ext-punct|¬
00AC
172
254
Template:Chset-color-ext-punct|SHY
00AD
173
255
Template:Chset-color-ext-punct|®
00AE
174
256
Template:Chset-color-ext-punct|¯
00AF
175
257
B_ Template:Chset-color-ext-punct|°
00B0
176
260
Template:Chset-color-ext-punct|±
00B1
177
261
Template:Chset-color-ext-punct|²
00B2
178
262
Template:Chset-color-ext-punct|³
00B3
179
263
Template:Chset-color-ext-punct|´
00B4
180
264
Template:Chset-color-ext-punct|µ
00B5
181
265
Template:Chset-color-ext-punct|
00B6
182
266
Template:Chset-color-ext-punct|·
00B7
183
267
Template:Chset-color-ext-punct|¸
00B8
184
270
Template:Chset-color-ext-punct|¹
00B9
185
271
Template:Chset-color-ext-punct|º
00BA
186
272
Template:Chset-color-ext-punct|»
00BB
187
273
Template:Chset-color-ext-punct|¼
00BC
188
274
Template:Chset-color-ext-punct|½
00BD
189
275
Template:Chset-color-ext-punct|¾
00BE
190
276
Template:Chset-color-ext-punct|¿
00BF
191
277
C_ Template:Chset-color-intl |À
00C0
192
300
Template:Chset-color-intl |Á
00C1
193
301
Template:Chset-color-intl |Â
00C2
194
302
Template:Chset-color-intl |Ã
00C3
195
303
Template:Chset-color-intl |Ä
00C4
196
304
Template:Chset-color-intl |Å
00C5
197
305
Template:Chset-color-intl |Æ
00C6
198
306
Template:Chset-color-intl |Ç
00C7
199
307
Template:Chset-color-intl |È
00C8
200
310
Template:Chset-color-intl |É
00C9
201
311
Template:Chset-color-intl |Ê
00CA
202
312
Template:Chset-color-intl |Ë
00CB
203
313
Template:Chset-color-intl |Ì
00CC
204
314
Template:Chset-color-intl |Í
00CD
205
315
Template:Chset-color-intl |Î
00CE
206
316
Template:Chset-color-intl |Ï
00CF
207
317
D_ Template:Chset-color-intl |Ð
00D0
208
320
Template:Chset-color-intl |Ñ
00D1
209
321
Template:Chset-color-intl |Ò
00D2
210
322
Template:Chset-color-intl |Ó
00D3
211
323
Template:Chset-color-intl |Ô
00D4
212
324
Template:Chset-color-intl |Õ
00D5
213
325
Template:Chset-color-intl |Ö
00D6
214
326
Template:Chset-color-ext-punct|×
00D7
215
327
Template:Chset-color-intl |Ø
00D8
216
330
Template:Chset-color-intl |Ù
00D9
217
331
Template:Chset-color-intl |Ú
00DA
218
332
Template:Chset-color-intl |Û
00DB
219
333
Template:Chset-color-intl |Ü
00DC
220
334
Template:Chset-color-intl |Ý
00DD
221
335
Template:Chset-color-intl |Þ
00DE
222
336
Template:Chset-color-intl |ß
00DF
223
337
E_ Template:Chset-color-intl |à
00E0
224
340
Template:Chset-color-intl |á
00E1
225
341
Template:Chset-color-intl |â
00E2
226
342
Template:Chset-color-intl |ã
00E3
227
343
Template:Chset-color-intl |ä
00E4
228
344
Template:Chset-color-intl |å
00E5
229
345
Template:Chset-color-intl |æ
00E6
230
346
Template:Chset-color-intl |ç
00E7
231
347
Template:Chset-color-intl |è
00E8
232
350
Template:Chset-color-intl |é
00E9
233
351
Template:Chset-color-intl |ê
00EA
234
352
Template:Chset-color-intl |ë
00EB
235
353
Template:Chset-color-intl |ì
00EC
236
354
Template:Chset-color-intl |í
00ED
237
355
Template:Chset-color-intl |î
00EE
238
356
Template:Chset-color-intl |ï
00EF
239
357
F_ Template:Chset-color-intl |ð
00F0
240
360
Template:Chset-color-intl |ñ
00F1
241
361
Template:Chset-color-intl |ò
00F2
242
362
Template:Chset-color-intl |ó
00F3
243
363
Template:Chset-color-intl |ô
00F4
244
364
Template:Chset-color-intl |õ
00F5
245
365
Template:Chset-color-intl |ö
00F6
246
366
Template:Chset-color-ext-punct|÷
00F7
247
367
Template:Chset-color-intl |ø
00F8
248
370
Template:Chset-color-intl |ù
00F9
249
371
Template:Chset-color-intl |ú
00FA
250
372
Template:Chset-color-intl |û
00FB
251
373
Template:Chset-color-intl |ü
00FC
252
374
Template:Chset-color-intl |ý
00FD
253
375
Template:Chset-color-intl |þ
00FE
254
376
Template:Chset-color-intl |ÿ
00FF
255
377
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

Similar character sets

ISO-8859-1 was incorporated as the first 256 code points of ISO/IEC 10646 and Unicode.

The lower range 20 to 7E (the G0 subset) maps exactly to the same coded G0 subset of the ISO 646 US variant (commonly known as ASCII), whose ISO 2022 standard switch sequence is "ESC ( B". The higher range A0 to FF (the G1 subset) maps exactly to the same subset initiated by the ISO 2022 standard switch sequence "ESC . A".

ISO/IEC 8859-1 is missing some characters for French and Finnish text and the euro sign. In order to provide some of these characters, ISO/IEC 8859-15 was developed as an update of ISO/IEC 8859-1. This required, however, the removal of some infrequently-used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½, and ¾.

The popular Windows-1252 character set adds all the missing characters provided by ISO/IEC 8859-15, plus a number of typographic symbols, by replacing the rarely-used C1 controls in the range 80 to 9F. It is very common to mislabel text data with the charset label ISO-8859-1, even though the data is really Windows-1252 encoded. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content.

The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It was meant to be suitable for Western European desktop publishing. It is a superset of ASCII, like ISO-8859-1, and has most of the characters that are in ISO-8859-1 but in a totally different arrangement. A later version, registered with IANA as "Macintosh", replaced the generic currency sign ¤ with the euro sign €. The few printable characters that are in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using older Macintosh browsers (including the last version of Internet Explorer for Mac). However the extra characters that Windows-1252 has in the C1 codepoint range are all supported in MacRoman.

DOS had code page 850, which had all printable characters that ISO-8859-1 had (albeit in a totally different arrangement) plus the most widely used graphic characters from code page 437.

See also

References