Jump to content

ISO/IEC 8859-1

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by BG19bot (talk | contribs) at 06:12, 3 December 2016 (WP:CHECKWIKI error fix. Syntax fixes. Do general fixes if a problem exists. -). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

ISO/IEC 8859-1:1998
MIME / IANAISO-8859-1
Alias(es)iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819
StandardISO/IEC 8859

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages. It is the basis for most popular 8-bit character sets, including Windows-1252 and the first block of characters in Unicode.

As of November 2016, 5.7% of all web sites claim to use ISO 8859-1,[1][2] however this includes an unknown number of pages actually using Windows-1252 and/or UTF-8, both of which are commonly recognized by browsers despite the character set tag.

ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 (see below for HTML5 exception). The following other aliases are registered for ISO-8859-1: iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819.

The Windows-1252 codepage coincides with ISO-8859-1 for all codes except the range 128 to 159 (hex 80 to 9F), where the little-used C1 controls are replaced with additional characters including all the missing characters provided by ISO-8859-15. Code page 28591 a.k.a. Windows-28591 is the actual ISO-8859-1 codepage.[3]

Coverage

Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):

Modern languages with complete coverage

Notes
  1. ^ Complete support except for Ǿ/ǿ which are missing. Ǿ/ǿ can be replaced with Ø/ø or øe at the cost of increased ambiguity.
  2. ^ US and modern British.
  3. ^ Kurdish Unified Alphabet, based on the Latin character set.
  4. ^ Basic classical orthography.
  5. ^ Rumi script.
  6. ^ Bokmål and Nynorsk.
  7. ^ European and Brazilian.

Languages commonly supported but with incomplete coverage

ISO-8859-1 is commonly used for certain languages, even though it lacks characters used by these languages. In most cases, only a few letters are missing, and they can be replaced satisfactorily with characters that are in ISO-8859-1 using some form of typographic approximation. The following table lists such languages.

Language Missing characters Typical workaround Supported by
Catalan Ŀ, ŀ (deprecated) L·, l·
Dutch IJ, ij digraphs IJ, ij
Estonian Š, š, Ž, ž (only present in loanwords) Sh, sh, Zh, zh ISO-8859-15, Windows-1252
Finnish Š, š, Ž, ž (only present in loanwords) Sh, sh, Zh, zh ISO-8859-15, Windows-1252
French Œ, œ, and the very rare Ÿ digraphs OE, oe, and Y without the diaeresis (or Ý) ISO-8859-15, ISO-8859-16, Windows-1252
Guarani , ẽ, Ĩ, ĩ, Ũ, ũ, , ỹ, , g̃ E~, e~, I~, i~, U~, u~, Y~, y~, G~, g~ or Ê, ê, Î, î, Û, û, Ý, ÿ
Hungarian Ő, ő, Ű, ű Õ, õ (or Ô, ô; sometimes Ö, ö), Û, û (sometimes Ü, ü) ISO-8859-2, Windows-1250
Irish (traditional orthography) Ḃ, ḃ, Ċ, ċ, Ḋ, ḋ, Ḟ, ḟ, Ġ, ġ, Ṁ, ṁ, Ṡ, ṡ, Ṫ, ṫ Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Sh, sh, Th, th ISO-8859-14
Latin with macrons Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū ISO-8859-13, Windows-1257
Māori Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū Ä, ä, Ë, ë, Ï, ï, Ö, ö, Ü, ü ISO-8859-13, Windows-1257
Romanian Ă, ă, Ș, ș, Ț, ț and older Ţ, ţ with cedilla A, a (or Ã, ã), S, s, T, t ISO-8859-2, Windows-1250 (Ţ, ţ with cedilla)
Turkish İ, ı, Ğ, ğ, Ş, ş I, ï, G, g, S, s ISO-8859-3, ISO-8859-9, Windows-1254
Welsh , ẁ, , ẃ, Ŵ, ŵ, Ŷ, ŷ Ý, ÿ ISO-8859-14

The letter ÿ, which appears in French only very rarely, and never at the beginning of words, is included only in lowercase form. The slot corresponding the its uppercase form is occupied by the letter ß from the German language, which itself also only exists in lowercase form.

Quotation marks

For some languages listed above, the correct typographical quotation marks are missing, as only « », " ", and ' ' are included. Also, this scheme does not provide for oriented (6- or 9-shaped) single or double quotation marks. Some fonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented single quotation marks, but this is not considered part of the modern standard.

History

ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation (DEC) in the popular VT220 terminal in 1983. It was developed within ECMA, the European Computer Manufacturers Association, and published in March 1985 as ECMA-94,[4] by which name it is still sometimes known. The second edition of ECMA-94 (June 1986)[5] also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification.

In 1985, Commodore adopted ECMA-94 for its new AmigaOS operating system. The Seikosha MP-1300AI impact dot-matrix printer, used with the Amiga 1000, included this encoding. [citation needed]

In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the unassigned code values thus provides for 256 characters via every possible 8-bit value.

ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/" (however the HTML5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[6]) It is the default encoding of the values of certain descriptive HTTP headers, and defines the repertoire of characters allowed in HTML 3.2 documents (HTML 4.0, however, is based on Unicode). It and Windows-1252 are often assumed to be the encoding of text on Unix and Microsoft Windows in the absence of locale or other information, this is only gradually being replaced with Unicode encoding such as UTF-8 or UTF-16.

Codepage layout

The two boxed codepoints 215 (0xD7) and 247 (0xF7) were still undefined in the first release of ECMA-94 (1985).[4]

  Letter  Number  Punctuation  Symbol  Other  Undefined

ISO/IEC 8859-1
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_ Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef
1_ Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef
2_ Template:Chset-color-punct|SP
0020
32
Template:Chset-color-punct|!
0021
33
Template:Chset-color-punct|"
0022
34
Template:Chset-color-punct|#
0023
35
Template:Chset-color-punct|$
0024
36
Template:Chset-color-punct|%
0025
37
Template:Chset-color-punct|&
0026
38
Template:Chset-color-punct|'
0027
39
Template:Chset-color-punct|(
0028
40
Template:Chset-color-punct|)
0029
41
Template:Chset-color-punct|*
002A
42
Template:Chset-color-punct|+
002B
43
Template:Chset-color-punct|,
002C
44
Template:Chset-color-punct|-
002D
45
Template:Chset-color-punct|.
002E
46
Template:Chset-color-punct|/
002F
47
3_ Template:Chset-color-digit|0
0030
48
Template:Chset-color-digit|1
0031
49
Template:Chset-color-digit|2
0032
50
Template:Chset-color-digit|3
0033
51
Template:Chset-color-digit|4
0034
52
Template:Chset-color-digit|5
0035
53
Template:Chset-color-digit|6
0036
54
Template:Chset-color-digit|7
0037
55
Template:Chset-color-digit|8
0038
56
Template:Chset-color-digit|9
0039
57
Template:Chset-color-punct|:
003A
58
Template:Chset-color-punct|;
003B
59
Template:Chset-color-punct|<
003C
60
Template:Chset-color-punct|=
003D
61
Template:Chset-color-punct|>
003E
62
Template:Chset-color-punct|?
003F
63
4_ Template:Chset-color-punct|@
0040
64
Template:Chset-color-alpha|A
0041
65
Template:Chset-color-alpha|B
0042
66
Template:Chset-color-alpha|C
0043
67
Template:Chset-color-alpha|D
0044
68
Template:Chset-color-alpha|E
0045
69
Template:Chset-color-alpha|F
0046
70
Template:Chset-color-alpha|G
0047
71
Template:Chset-color-alpha|H
0048
72
Template:Chset-color-alpha|I
0049
73
Template:Chset-color-alpha|J
004A
74
Template:Chset-color-alpha|K
004B
75
Template:Chset-color-alpha|L
004C
76
Template:Chset-color-alpha|M
004D
77
Template:Chset-color-alpha|N
004E
78
Template:Chset-color-alpha|O
004F
79
5_ Template:Chset-color-alpha|P
0050
80
Template:Chset-color-alpha|Q
0051
81
Template:Chset-color-alpha|R
0052
82
Template:Chset-color-alpha|S
0053
83
Template:Chset-color-alpha|T
0054
84
Template:Chset-color-alpha|U
0055
85
Template:Chset-color-alpha|V
0056
86
Template:Chset-color-alpha|W
0057
87
Template:Chset-color-alpha|X
0058
88
Template:Chset-color-alpha|Y
0059
89
Template:Chset-color-alpha|Z
005A
90
Template:Chset-color-punct|[
005B
91
Template:Chset-color-punct|\
005C
92
Template:Chset-color-punct|]
005D
93
Template:Chset-color-punct|^
005E
94
Template:Chset-color-punct|_
005F
95
6_ Template:Chset-color-punct|`
0060
96
Template:Chset-color-alpha|a
0061
97
Template:Chset-color-alpha|b
0062
98
Template:Chset-color-alpha|c
0063
99
Template:Chset-color-alpha|d
0064
100
Template:Chset-color-alpha|e
0065
101
Template:Chset-color-alpha|f
0066
102
Template:Chset-color-alpha|g
0067
103
Template:Chset-color-alpha|h
0068
104
Template:Chset-color-alpha|i
0069
105
Template:Chset-color-alpha|j
006A
106
Template:Chset-color-alpha|k
006B
107
Template:Chset-color-alpha|l
006C
108
Template:Chset-color-alpha|m
006D
109
Template:Chset-color-alpha|n
006E
110
Template:Chset-color-alpha|o
006F
111
7_ Template:Chset-color-alpha|p
0070
112
Template:Chset-color-alpha|q
0071
113
Template:Chset-color-alpha|r
0072
114
Template:Chset-color-alpha|s
0073
115
Template:Chset-color-alpha|t
0074
116
Template:Chset-color-alpha|u
0075
117
Template:Chset-color-alpha|v
0076
118
Template:Chset-color-alpha|w
0077
119
Template:Chset-color-alpha|x
0078
120
Template:Chset-color-alpha|y
0079
121
Template:Chset-color-alpha|z
007A
122
Template:Chset-color-punct|{
007B
123
Template:Chset-color-punct||
007C
124
Template:Chset-color-punct|}
007D
125
Template:Chset-color-punct|~
007E
126
Template:Chset-color-undef
8_ Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef
9_ Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef Template:Chset-color-undef
A_ Template:Chset-color-ext-punct|NBSP
00A0
160
Template:Chset-color-ext-punct|¡
00A1
161
Template:Chset-color-ext-punct|¢
00A2
162
Template:Chset-color-ext-punct|£
00A3
163
Template:Chset-color-ext-punct|¤
00A4
164
Template:Chset-color-ext-punct|¥
00A5
165
Template:Chset-color-ext-punct|¦
00A6
166
Template:Chset-color-ext-punct|§
00A7
167
Template:Chset-color-ext-punct|¨
00A8
168
Template:Chset-color-ext-punct|©
00A9
169
Template:Chset-color-intl|ª
00AA
170
Template:Chset-color-ext-punct|«
00AB
171
Template:Chset-color-ext-punct|¬
00AC
172
Template:Chset-color-ext-punct|SHY
00AD
173
Template:Chset-color-ext-punct|®
00AE
174
Template:Chset-color-ext-punct|¯
00AF
175
B_ Template:Chset-color-ext-punct|°
00B0
176
Template:Chset-color-ext-punct|±
00B1
177
Template:Chset-color-digit|²
00B2
178
Template:Chset-color-digit|³
00B3
179
Template:Chset-color-ext-punct|´
00B4
180
Template:Chset-color-intl|µ
00B5
181
Template:Chset-color-ext-punct|
00B6
182
Template:Chset-color-ext-punct|·
00B7
183
Template:Chset-color-ext-punct|¸
00B8
184
Template:Chset-color-digit|¹
00B9
185
Template:Chset-color-intl|º
00BA
186
Template:Chset-color-ext-punct|»
00BB
187
Template:Chset-color-ext-punct|¼
00BC
188
Template:Chset-color-ext-punct|½
00BD
189
Template:Chset-color-ext-punct|¾
00BE
190
Template:Chset-color-ext-punct|¿
00BF
191
C_ Template:Chset-color-intl |À
00C0
192
Template:Chset-color-intl |Á
00C1
193
Template:Chset-color-intl |Â
00C2
194
Template:Chset-color-intl |Ã
00C3
195
Template:Chset-color-intl |Ä
00C4
196
Template:Chset-color-intl |Å
00C5
197
Template:Chset-color-intl |Æ
00C6
198
Template:Chset-color-intl |Ç
00C7
199
Template:Chset-color-intl |È
00C8
200
Template:Chset-color-intl |É
00C9
201
Template:Chset-color-intl |Ê
00CA
202
Template:Chset-color-intl |Ë
00CB
203
Template:Chset-color-intl |Ì
00CC
204
Template:Chset-color-intl |Í
00CD
205
Template:Chset-color-intl |Î
00CE
206
Template:Chset-color-intl |Ï
00CF
207
D_ Template:Chset-color-intl |Ð
00D0
208
Template:Chset-color-intl |Ñ
00D1
209
Template:Chset-color-intl |Ò
00D2
210
Template:Chset-color-intl |Ó
00D3
211
Template:Chset-color-intl |Ô
00D4
212
Template:Chset-color-intl |Õ
00D5
213
Template:Chset-color-intl |Ö
00D6
214
Template:Chset-color-ext-punct-box|×
00D7
215
Template:Chset-color-intl |Ø
00D8
216
Template:Chset-color-intl |Ù
00D9
217
Template:Chset-color-intl |Ú
00DA
218
Template:Chset-color-intl |Û
00DB
219
Template:Chset-color-intl |Ü
00DC
220
Template:Chset-color-intl |Ý
00DD
221
Template:Chset-color-intl |Þ
00DE
222
Template:Chset-color-intl |ß
00DF
223
E_ Template:Chset-color-intl |à
00E0
224
Template:Chset-color-intl |á
00E1
225
Template:Chset-color-intl |â
00E2
226
Template:Chset-color-intl |ã
00E3
227
Template:Chset-color-intl |ä
00E4
228
Template:Chset-color-intl |å
00E5
229
Template:Chset-color-intl |æ
00E6
230
Template:Chset-color-intl |ç
00E7
231
Template:Chset-color-intl |è
00E8
232
Template:Chset-color-intl |é
00E9
233
Template:Chset-color-intl |ê
00EA
234
Template:Chset-color-intl |ë
00EB
235
Template:Chset-color-intl |ì
00EC
236
Template:Chset-color-intl |í
00ED
237
Template:Chset-color-intl |î
00EE
238
Template:Chset-color-intl |ï
00EF
239
F_ Template:Chset-color-intl |ð
00F0
240
Template:Chset-color-intl |ñ
00F1
241
Template:Chset-color-intl |ò
00F2
242
Template:Chset-color-intl |ó
00F3
243
Template:Chset-color-intl |ô
00F4
244
Template:Chset-color-intl |õ
00F5
245
Template:Chset-color-intl |ö
00F6
246
Template:Chset-color-ext-punct-box|÷
00F7
247
Template:Chset-color-intl |ø
00F8
248
Template:Chset-color-intl |ù
00F9
249
Template:Chset-color-intl |ú
00FA
250
Template:Chset-color-intl |û
00FB
251
Template:Chset-color-intl |ü
00FC
252
Template:Chset-color-intl |ý
00FD
253
Template:Chset-color-intl |þ
00FE
254
Template:Chset-color-intl |ÿ
00FF
255
ISO/IEC 8859-1
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

Similar character sets

ISO-8859-1 was incorporated as the first 256 code points of ISO/IEC 10646 and Unicode.

The lower range 32 to 126 (hex 20 to 7E, the G0 subset) maps exactly to the same coded G0 subset of the ISO 646 US variant (commonly known as ASCII), whose ISO 2022 standard switch sequence is "ESC ( B". The higher range 160 to 255 (hex A0 to FF, the G1 subset) maps exactly to the same subset initiated by the ISO 2022 standard switch sequence "ESC . A".

ISO/IEC 8859-1 is missing some characters for French and Finnish text and the euro sign. In order to provide some of these characters, ISO/IEC 8859-15 was developed as an update of ISO/IEC 8859-1. This required, however, the removal of some infrequently used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½, and ¾.

The popular Windows-1252 character set adds all the missing characters provided by ISO/IEC 8859-15, plus a number of typographic symbols, by replacing the rarely used C1 controls in the range 128 to 159 (hex 80 to 9F). It is very common to mislabel text data with the charset label ISO-8859-1, even though the data is really Windows-1252 encoded. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters, and that behavior was later standardized in HTML5,[7] in order to accommodate such mislabeling and care should be taken to avoid generating these characters in ISO-8859-1 labeled content.

The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It was meant to be suitable for Western European desktop publishing. It is a superset of ASCII, like ISO-8859-1, and has most of the characters that are in ISO-8859-1 but in a totally different arrangement. A later version, registered with IANA as "Macintosh", replaced the generic currency sign ¤ with the euro sign . The few printable characters that are in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using older Macintosh browsers (including the last version of Internet Explorer for Mac). However the extra characters that Windows-1252 has in the C1 codepoint range are all supported in MacRoman.

DOS had code page 850, which had all printable characters that ISO-8859-1 had (albeit in a totally different arrangement) plus the most widely used graphic characters from code page 437.

Between 1989[8] and 2015 Hewlett-Packard used another superset of ISO-8859-1 on many of their calculators. This proprietary character set was sometimes referred to simply as "ECMA-94" as well.[8]

See also

References

  1. ^ http://w3techs.com/technologies/history_overview/character_encoding
  2. ^ http://w3techs.com/faq
  3. ^ "Code Page Identifiers". Microsoft Corporation. Retrieved 2010-12-19.
  4. ^ a b Standard ECMA-94: 8-bit Single-Byte Coded Graphic Character Set (PDF) (1 ed.). European Computer Manufacturers Association (ECMA). March 1985 [1984-12-14]. Archived from the original (PDF) on 2016-12-01. Retrieved 2016-12-01. […] Since 1982 the urgency of the need for an 8-bit single-byte coded character set was recognized in ECMA as well as in ANSI/X3L2 and numerous working papers were exchanged between the two groups. In February 1984 ECMA TC1 submitted to ISO/TC97/SC2 a proposal for such a coded character set. At its meeting of April 1984 SC decided to submit to TC97 a proposal for a new item of work for this topic. Technical discussions during and after this meeting led TC1 to adopt the coding scheme proposed by X3L2. Part 1 of Draft International Standard DTS 8859 is based on this joint ANSI/ECMA proposal. […] Adopted as an ECMA Standard by the General Assembly of Dec. 13–14, 1984. […] {{cite book}}: |archive-date= / |archive-url= timestamp mismatch; 2016-12-02 suggested (help); Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  5. ^ second edition of ECMA-94 (June 1986)
  6. ^ HTML 5 Draft Recommendation — 12 April 2010, 8.1 Character encodings, retrieved [2010-04-12].
  7. ^ WHATWG. "Encoding Standard". Retrieved 2016-11-15. {{cite web}}: |section= ignored (help)
  8. ^ a b HP 82240B Infrared Printer (1 ed.). Corvallis, OR, USA: Hewlett Packard. August 1989. HP reorder number 82240-90014. Retrieved 2016-08-01. {{cite book}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)