Windows-1252: Difference between revisions
→top: Removed confusing parenthetical that made it sound like this is not used for Spanish or German |
→Character set: change border to red to make it more noticable |
||
Line 187: | Line 187: | ||
|- |
|- |
||
!{{chset-left2|8_<br/>128}} |
!{{chset-left2|8_<br/>128}} |
||
|{{chset-color-graph-box}}|{{chset-cell3|20AC|[[Euro sign|€]]|0128}} |
|{{chset-color-graph-box|red}}|{{chset-cell3|20AC|[[Euro sign|€]]|0128}} |
||
|{{chset-color-undef}}| |
|{{chset-color-undef}}| |
||
|{{chset-color-punct-box}}|{{chset-cell3|201A|[[Curved quotes|‚]]|0130}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|201A|[[Curved quotes|‚]]|0130}} |
||
|{{chset-color-letter-box}}|{{chset-cell3|0192|[[Florin sign|ƒ]]|0131}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|0192|[[Florin sign|ƒ]]|0131}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|201E|[[Curved quotes|„]]|0132}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|201E|[[Curved quotes|„]]|0132}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2026|[[Ellipsis|…]]|0133}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2026|[[Ellipsis|…]]|0133}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2020|[[Dagger (typography)|†]]|0134}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2020|[[Dagger (typography)|†]]|0134}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2021|[[Dagger (typography)|‡]]|0135}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2021|[[Dagger (typography)|‡]]|0135}} |
||
|{{chset-color-letter-box}}|{{chset-cell3|02C6|[[Circumflex|ˆ]]|0136}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|02C6|[[Circumflex|ˆ]]|0136}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2030|[[Permille|‰]]|0137}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2030|[[Permille|‰]]|0137}} |
||
|{{chset-color-letter-box}}|{{chset-cell3|0160|[[Š]]|0138}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|0160|[[Š]]|0138}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2039|[[Guillemet|‹]]|0139}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2039|[[Guillemet|‹]]|0139}} |
||
|{{chset-color-letter-box}}|{{chset-cell3|0152|[[Œ]]|0140}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|0152|[[Œ]]|0140}} |
||
|{{chset-color-undef}}| |
|{{chset-color-undef}}| |
||
|{{chset-color-letter-box}}|{{chset-cell3|017D|[[Ž]]|0142}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|017D|[[Ž]]|0142}} |
||
|{{chset-color-undef}}| |
|{{chset-color-undef}}| |
||
|- |
|- |
||
!{{chset-left2|9_<br/>144}} |
!{{chset-left2|9_<br/>144}} |
||
|{{chset-color-undef}}| |
|{{chset-color-undef}}| |
||
|{{chset-color-punct-box}}|{{chset-cell3|2018|[[‘]]|0145}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2018|[[‘]]|0145}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2019|[[’]]|0146}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2019|[[’]]|0146}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|201C|[[“]]|0147}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|201C|[[“]]|0147}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|201D|[[”]]|0148}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|201D|[[”]]|0148}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2022|[[Bullet (typography)|•]]|0149}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2022|[[Bullet (typography)|•]]|0149}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2013|[[En Dash|–]]|0150}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2013|[[En Dash|–]]|0150}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|2014|[[Em Dash|—]]|0151}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|2014|[[Em Dash|—]]|0151}} |
||
|{{chset-color-graph-box}}|{{chset-cell3|02DC|[[Small tilde|˜ ]]|0152}} |
|{{chset-color-graph-box|red}}|{{chset-cell3|02DC|[[Small tilde|˜ ]]|0152}} |
||
|{{chset-color-graph-box}}|{{chset-cell3|2122|[[Trademark symbol|™]]|0153}} |
|{{chset-color-graph-box|red}}|{{chset-cell3|2122|[[Trademark symbol|™]]|0153}} |
||
|{{chset-color-letter-box}}|{{chset-cell3|0161|[[š]]|0154}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|0161|[[š]]|0154}} |
||
|{{chset-color-punct-box}}|{{chset-cell3|203A|[[Guillemet|›]]|0155}} |
|{{chset-color-punct-box|red}}|{{chset-cell3|203A|[[Guillemet|›]]|0155}} |
||
|{{chset-color-letter-box}}|{{chset-cell3|0153|[[œ]]|0156}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|0153|[[œ]]|0156}} |
||
|{{chset-color-undef}}| |
|{{chset-color-undef}}| |
||
|{{chset-color-letter-box}}|{{chset-cell3|017E|[[ž]]|0158}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|017E|[[ž]]|0158}} |
||
|{{chset-color-letter-box}}|{{chset-cell3|0178|[[Ÿ]]|0159}} |
|{{chset-color-letter-box|red}}|{{chset-cell3|0178|[[Ÿ]]|0159}} |
||
|- |
|- |
||
!{{chset-left2|A_<br/>160}} |
!{{chset-left2|A_<br/>160}} |
||
Line 330: | Line 330: | ||
|{{chset-color-letter}}|{{chset-cell3|00FF|[[ÿ]]|0255}} |
|{{chset-color-letter}}|{{chset-cell3|00FF|[[ÿ]]|0255}} |
||
|} |
|} |
||
{{ |
{{Legend inline|Transparent|border=medium solid red|Differences from [[ISO-8859-1]]}} |
||
<!-- See {{chset-tableformat}} --> |
<!-- See {{chset-tableformat}} --> |
Revision as of 09:16, 27 July 2020
MIME / IANA | windows-1252[1] |
---|---|
Language(s) | Basically all supported by ISO/IEC 8859-1 e.g. English, Irish, Italian, Norwegian, Portuguese, Spanish, Swedish. Plus also German, Finnish and French. And Dutch except the IJ character. |
Created by | Microsoft |
Standard | WHATWG Encoding Standard |
Classification | extended ASCII, Windows-125x |
Extends | ISO 8859-1 (excluding C1 controls) |
Transforms / Encodes | ISO 8859-15 |
Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages such as Spanish, French, and German.
It is the most-used single-byte character encoding in the world (especially if you include the compatible ASCII and ISO-8859-1 encodings). As of July 2020[update], 0.4% of all web sites declared use of Windows-1252,[2][3] but at the same time 2.0%[2] used ISO 8859-1 (while only 0.7% of top-1000 websites[4]), which by HTML5 standards should be considered the same encoding,[5] so that 2.4% of web sites effectively use Windows-1252.
Details
This character encoding is a superset of ISO 8859-1 in terms of printable characters, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (hex) range. Notable additional characters include curly quotation marks and all the printable characters that are in ISO 8859-15 (at different places than ISO 8859-15). It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252".
It is very common to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Most modern web browsers and e-mail clients treat the media type charset ISO-8859-1 as Windows-1252 to accommodate such mislabeling. This is now standard behavior in the HTML5 specification, which requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[5]
Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be ANSI standards such as ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."[6]
In LaTeX packages, CP-1252 is referred to as "ansinew".
IBM uses code page 1252 (CCSID 1252 and euro sign extended CCSID 5348) for Windows-1252.[7][8][9]
Character set
The following table shows Windows-1252. Each character is shown with its Unicode equivalent based on the Unicode.org mapping of Windows-1252 with "best fit". The decimal numbers are the Alt code that can be used to type these on Windows systems.
Differences from ISO-8859-1
According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar
maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.[10]
History
- The first version of the codepage 1252 used in Microsoft Windows 1.0 did not have positions D7 and F7 defined. All the characters in the ranges 80–9F were undefined too.
- The second version, used in Microsoft Windows 2.0, positions D7, F7, 91, and 92 had been defined.
- The third version, used since Microsoft Windows 3.1, had all the present-day positions defined, except euro sign and Z with caron character pair.
- The final version listed above debuted in Microsoft Windows 98 and was ported to older versions of Windows with the euro symbol update.
OS/2 extensions
The OS/2 operating system supports an encoding by the name of Code page 1004 (CCSID 1004) or "Windows Extended".[15][16] This mostly matches code page 1252, with the exception of certain C0 control characters being replaced by diacritic characters.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0_ 0 |
Template:Chset-color-ctrl|NUL 0000 |
Template:Chset-color-ctrl|SOH 0001 |
Template:Chset-color-ctrl|STX 0002 |
Template:Chset-color-ctrl|ETX 0003 |
Template:Chset-color-letter-box|ˉ 02C9 |
Template:Chset-color-graph-box|˘ 02D8 |
Template:Chset-color-graph-box|˙ 02D9 |
Template:Chset-color-ctrl|BEL 0007 |
Template:Chset-color-graph-box|˚ 02DA |
Template:Chset-color-ctrl|HT 0009 |
Template:Chset-color-graph-box|˝ 02DD |
Template:Chset-color-graph-box|˛ 02DB |
Template:Chset-color-letter-box|ˇ 02C7 |
Template:Chset-color-ctrl|CR 000D |
Template:Chset-color-ctrl|SO 000E |
Template:Chset-color-ctrl|SI 000F |
Differences from Windows-1252
MSDOS extensions [rare]
There is a rarely used, but useful, graphics extended code page 1252 where codes 0x00 to 0x1f allow for box drawing as used in applications such as MSDOS Edit and Codeview. One of the applications to use this code page was an Intel Corporation Install/Recovery disk image utility from mid/late 1995. These programs were written for its P6 User Test Program machines (US example[21]). It was used exclusively in its then EMEA region (Europe, Middle East & Africa). In time the programs were changed to use code page 850.
See also
References
- ^ Character Sets, Internet Assigned Numbers Authority (IANA), 2018-12-12
- ^ a b "Historical trends in the usage of character encodings, July 2020". Retrieved 2020-07-23.
- ^ "Frequently Asked Questions".
- ^ "Usage Survey of Character Encodings broken down by Ranking". w3techs.com. Retrieved 2020-07-24.
- ^ a b "Encoding". WHATWG. 27 January 2015. sec. 5.2 Names and labels. Archived from the original on 4 February 2015. Retrieved 4 February 2015.
- ^ Wissink, Cathy (5 April 2002). "Unicode and Windows XP" (PDF). Microsoft. p. 1. Archived (PDF) from the original on 4 February 2015. Retrieved 4 February 2015.
- ^ "Code page 1252 information document". Archived from the original on 2016-03-03.
- ^ "CCSID 1252 information document". Archived from the original on 2016-03-26.
- ^ "CCSID 5348 information document". Archived from the original on 2014-11-29.
- ^ a b "Unicode mappings of Windows-1252 with 'Best Fit'". Unicode. Archived from the original on 4 February 2015. Retrieved 4 February 2015.
- ^ Code Page CPGID 01252 (pdf) (PDF), IBM
- ^ Code Page CPGID 01252 (txt), IBM
- ^ International Components for Unicode (ICU), ibm-1252_P100-2000.ucm, 2002-12-03
- ^ International Components for Unicode (ICU), ibm-5348_P100-1997.ucm, 2002-12-03
- ^ "Code page 1004 information document". Archived from the original on 2015-06-25.
- ^ "CCSID 1004 information document". Archived from the original on 2016-03-26.
- ^ "Code Page 01004" (PDF). IBM. Archived from the original (PDF) on 2015-07-08. (version based on Windows 3.1 version of Windows-1252)
- ^ Code Page CPGID 01004 (pdf) (PDF), IBM
- ^ Code Page CPGID 01004 (txt), IBM
- ^ Borgendale, Ken (2001). "Codepage 1004 - Windows Extended". OS/2 codepages by number. Archived from the original on 2018-05-13. Retrieved 2018-05-13. (version based on current version of Windows-1252)
- ^ "Performance of NASA Equation Solvers on Computational Mechanics Applications" (PDF). NASA.
External links
- Microsoft's code charts for Windows-1252 ("Code Page 1252 Windows Latin 1 (ANSI)")
- Unicode mapping table and code page definition with best fit mappings for Windows-1252