Talk:Windows-1252

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing / CompSci (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Computer science (marked as Low-importance).
 
WikiProject Typography (Rated C-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Typography, a collaborative effort to improve the coverage of articles related to Typography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.
 

Outlining of differences from ISO-8859-1[edit]

The article says: "The following table shows Windows-1252, with differences from ISO-8859-1 outlined." How are they outlined? I can't see it.


—Preceding unsigned comment added by 88.79.125.76 (talk) 07:19, 9 July 2009 (UTC)

The introduction to this page needs to state whether Windows-1252 is CP1252 or what relation they have to each other. Only in the table caption are the two terms associated, and this page is the #1 Google hit for CP1252. RCanine 21:14, 17 October 2007 (UTC)

Why can the codepage Windows-1252 not used on MacOS or Linux? --84.61.69.103 16:10, 8 June 2006 (UTC)

I'm not sure exactly what you mean by that; some software, even in non-Windows operating systems, might recognize this character encoding and properly render documents transmitted or stored using it; however, as its name implies, it's an encoding which was devised by Microsoft for use in Windows, and is not one of the standard, platform-neutral ones such as iso-8859-1 or utf-8. *Dan T.* 16:47, 8 June 2006 (UTC)
Most operating systems use some unicode encoding and/or a small selection of legacy encodings as an internal format. What is supported by the conversion libs and apps that communicate over open protocols is generally a much wider selection and the windows-125x encodings certainly get this level of support on all major platforms. Windows-1252 is also the unofficial default character set of the web (officially its ISO-8859-1 but since the C1 control codes are banned anyway...........) Plugwash 22:42, 8 June 2006 (UTC)
I'm not sure exactly what you mean by that, but I guess this; for example, in Linux you can not mount fat32 volumes with 'iocharset=cp1252', and too often 'locales' usually put things worse. At the end, it's almost impossible to interchange files between Windows and Linux without messing up the filenames, (at least for me, an average Linux user, outside USA).

Is Windows-1252 a proprietary codepage? --84.61.43.60 16:44, 31 August 2006 (UTC)

Well it was introduced by a vendor without any standards body approval but there is nothing preventing anyone who wants to from implementing support for it. As simple statements of facts about an encoding method i'm pretty sure raw codepage tables are not copyrightable and information on the code page is freely availible to anyone who wants it from microsoft themselves, unicode.org and countless other places. Plugwash 16:14, 1 September 2006 (UTC)

"AFAIK, only IE and other MS products do this"[edit]

The above was an edit reason for the addition of a {{fact}} tag. I don't have a cite but i tried it with IE and firefox on windows and firefox and konqueror on linux and they all did it. Plugwash 15:51, 6 September 2006 (UTC)

Whoops, got my facts screwed up. IE and MS products do this for ISO-8859-15. The others do it for ISO-8859-1 only. Removing request for citation. --ChrisRuvolo (t) 19:03, 6 September 2006 (UTC)
Ineligible for copyright. --▦Frogger3140▦ (talk) 12:20, 26 September 2008 (UTC)

0x80-0x9F was not used in ISO Latin-1[edit]

The article says, "The encoding is a superset of ISO 8859-1, but differs by using displayable characters rather than control characters in the 0x80 to 0x9F range." But the ISO 8859-1 page has it that 0x80-0x9F is left unused. Which is right? --Apantomimehorse 02:44, 11 September 2006 (UTC)

i've made a minor correction to this article now. Plugwash 00:25, 27 September 2006 (UTC)
This mistake is still present.. Did someone revert it, and why ? Danadocus (talk) 18:25, 12 May 2010 (UTC)
ISO_8859-1:1987 (better known as the MIME type ISO-8859-1 — note extra dash) does map these characters as control codes. See ISO 8859-1#ISO-8859-1. You are in a maze of twisty encodings, all alike. --ChrisRuvolo (t) 19:19, 12 May 2010 (UTC)

Table[edit]

The table in this article is really poor; for instance the difference between zero and O is almost invisible and the there is no difference between lower case l and upper case i. Worse, the difference between the various types of quotes is completely lost. I think we should follow the nice example at Code page 437 or at least use a font that can convey the differences of the characters. AxelBoldt 21:00, 6 October 2006 (UTC)

The FF hex character code (lower right corner of table) is AFAIK only defined in Windows ANSI, not in Latin-1. - Alf—The preceding unsigned comment was added by 81.191.161.87 (talk) 22:39, 18 February 2007 (UTC).

What annoys me is that there is no legend explaining what the different background colors actually mean. --JVersteeg 17:29, 24 September 2007 (UTC)


Copy. Anyway, I just wanted to make this table look like the one in windows-1250, which omits the lower half. This provides the same information in a more concise way so I guess that's nicer.
MaxDZ8 talk 13:01, 8 April 2008 (UTC)

I am going to revert this change to take out the bottom half. It is convenient to have it all available for anyone looking for this. While this view shows what is explicitly different the rest of the table is part of the code page. Perhaps the page for 1250 should be changed? this is especially relevant because many MS products still use 1252. An example of this is VBA in excel. tseabrooks —Preceding unsigned comment added by 64.221.222.142 (talk) 16:06, 10 April 2008 (UTC)

I understand. I'm not 100% sure this metric to be valuable, after all, US-ASCII is well defined. Although a consistent look would be desiderable, 1250 has been that way for a while and it seemed to work.
MaxDZ8 talk 07:21, 12 April 2008 (UTC)

Alt key input[edit]

In fact, this method enters characters from "ANSI" and "OEM" codepages associated with current keyboard language/layout, not just 1251 and 437. Thus switching keyboard between say Russian and Norwegian one can enter different sets of characters.

Subset[edit]

Wow! I thought about making my change a minor one, I did not think anyone could seriously object to my modifications. I am not an experienced wikipedia contributor, so I wanted to play it save ... well.

I am not convinced by your arguments. It doesn't matter whether people use the C1 control codes of ISO 8859-1 or not. The octets in question are defined in both encodings, with different meanings. The term "subset" is, thus, wrong. One can argue that the average user benefits from the extra glyphs he can produce by using Windows-1252, more than from the contol codes of ISO 8859-1. However, this is a technical matter we are talking about; it helps to be precise. More so because readability was not disturbed and there where no information lost due to my editing. Traxer 17:04, 19 February 2007 (UTC)

One could say that the printable characters (the ones with visible glyphs) of windows-1252 are a superset of those in iso-8859-1, but not the complete set of characters (printable or control) in both encodings. The encodings themselves aren't "sets" (or "subsets" or "supersets"), because, strictly speaking, a mathematical set has no ordinal numbers assigned to its components (there is just a cardinal number of the set's membership), while a character encoding consists of a series of ordinal pairings between numbers and characters. *Dan T.* 19:50, 19 February 2007 (UTC)
The thing is, for the purposes of this page, abstract mathematical set theory and the Zermelo-Frankel axiom (or whatever) is all completely irrelevant. What matters is is what people commonly mean when they use "superset" in the context of computer software and character "sets. However, if you want to assuage your mathematical conscience, you can reflect that the official ISO/IEC 8859-1 specification technically doesn't define the "C1" control area -- see the green areas in the table on that article page. AnonMoos 04:54, 20 February 2007 (UTC)
"The encodings themselves aren't sets"?. Hogwash. An encoding isn't a set of characters, but it is a set of (character, code) pairs, and in that sense one encoding can certainly be a subset of another. Mhkay 17:55, 12 September 2007 (UTC)
If ISO/IEC 8859-1 specification does not define the code values 0x80 to 0x9F, then the phrase "using displayable characters rather than control characters in the 0x80 to 0x9F range" is misleading.Traxer 17:08, 21 February 2007 (UTC)
Note the distinction between the standard "ISO/IEC 8859-1" IANA charset "ISO-8859-1", the former does not define any control codes, the latter does. Plugwash 20:34, 21 February 2007 (UTC)
The text compares Windows-1252 and ISO/IEC 8859-1, not Windows-1252 and IANA's ISO-8859-1. AnonMoos is right on that point, 0x80 to 0x9F are not defined. Traxer 10:27, 22 February 2007 (UTC)

Historical Accuracy[edit]

This statement strikes me as post-hoc rationalisation rather than historical fact:

"The name has been taken from an early ANSI draft, that later, was modified and became ISO-8859-1."

It seems unlikely because ISO-8859-1 was developed within ECMA and was an ISO standard before it was an ANSI standard. Personally, I have always suspected that Microsoft initially called it ANSI because they were proposing to implement the ANSI standard (that is, the ISO standard which is published in the US rebranded as ANSI), and they continued to refer to it internally as ANSI after they started making changes to it.

Mhkay 17:50, 12 September 2007 (UTC)

If you can show that ISO-8859-1 was developed internally within ECMA and that Microsoft had no access to its drafts, you will have disproven the statement you cite. Note however that Microsoft is an ECMA member.
I cannot figure out exactly what you are saying in the second half of the post. Are you saying perhaps that Microsoft intended to submit a modified version of the ISO standard as an ANSI proposal? If so, do you have a citation for that? — Preceding unsigned comment added by 82.139.87.39 (talk) 06:23, 2 October 2011 (UTC)
I think that the line of development that led to ISO-8859-1 went through several incarnations, and wasn't a Europe-only thing. The DEC VT220 character set seems to have been somewhat influential at the beginning. AnonMoos (talk) 12:02, 2 October 2011 (UTC)

See below for a plausible cause of the ANSI misnomer. — Preceding unsigned comment added by 82.139.81.0 (talk) 16:58, 28 May 2014 (UTC)

ansinew[edit]

The following statement is very imprecise, to say the least:

In LaTeX packages, it is referred to as ansinew.

Precisely, there's fundamentally one package for dealing with input encoding, namely inputenc.sty and in it Windows-1252 can be referred to both as cp1252 or ansinew of which the documentation literally says "Windows 3.1 ANSI encoding, extension of Latin-1 (synonym1 for cp1252.)" --Blazar.writeto() 23:00, 19 August 2008 (UTC)

Outlines[edit]

The text says: "The following table shows Windows-1252, with differences from ISO-8859-1 outlined." That's not true at this moment, apparently because of a MediaWiki limitation. The part that says style="border-width:3px" gets overriden, rather than added to, by the template {{chset-color-punct}} which as of this writing expands to style="background:#DFF7FF;". How should that be fixed? The two solutions I see are to report it to MediaWiki hoping for a fix, or to create new templates like e.g. {{chset-color-punct-outlined}} expanding to style="border-width:3px;background:#DFF7FF;". The latter has the advantage of not having to deal with style=... tags. The former would have the advantage of being able to combine templates, e.g. {{chset-color-punct}}{{outlined}} --pgimeno (talk) 07:49, 14 April 2009 (UTC)

That's annoying; it's also affecting ISO/IEC 8859-9 (they used to work). AnonMoos (talk) 09:27, 14 April 2009 (UTC)

HTML[edit]

The most recent edit removed the interesting fact that HTML 5 says that the character encoding ISO-8859-1 should be handled as though it was CP1252. Can somebody confirm this is true and I think the text should be reverted if so. The current text just says that HTML 5 can accept CP1252 as a character encoding which is nowhere near as interesting and useful as a fact.Spitzak (talk) 21:19, 5 February 2010 (UTC)

There is info confirming this in a HTML5 draft here: [1]. However, it notes that this is a willful violation of the W3C spec. For that reason, the final HTML5 spec may not include this section, and it would be inappropriate to mention it until finalized, IMO. --ChrisRuvolo (t) 19:13, 12 May 2010 (UTC)

Codepage layout asterisks[edit]

In the Codepage layout section there are asterisks after decimal values 128 to 159. It's not clear why these are there. Below the table are two paragraphs. The first paragraph is about the color coding and the second is about character positions 80, 81, 8D, 8F, 90, and 9D.

I'm assuming the asterisks are to lead the reader to the second paragraph but it's confusing to me and I suspect other readers. I'm not sure what the goal of the asterisks are

  • The asterisks are next to decimal values and yet the decimal values are never mentioned in the paragraph. Obviously easy to fix but that's not the core issue.
  • The paragraph mentions five specific code positions, 80, 81, 8D, 8F, 90, and 9D (hex) and yet there are asterisks next to 27 (decimal) positions in the table.
  • Perhaps the asterisks are about the entire C1 control code range. If so, then why don't cells 129, 141, 143, 144, and 157 (decimal) have numbers and asterisks?
  • The reference to C1 control code on the second paragraph escapes me (pun intended). Windows-1252 defines glyphs for various points in the 80-9F positions. I do not believe it concerns itself at all with the C1 control codes. --Marc Kupper|talk 05:22, 12 August 2011 (UTC)

It says above the chart "differences from ISO-8859-1 marked with thick borders and asterisks". Not sure whether both are needed (the asterisks were added because of a past technical problem with the borders, see section "Outlines" above), but that's why it's there... AnonMoos (talk) 11:54, 12 August 2011 (UTC)

Characters 0...31[edit]

It states on the Latin-1 page that Unicode codepoints in the Latin-1 range are often interpreted as 1252 by software. According to this page characters 0...31 are empty / control characters. But when I actually print them on Windows (using TextOutW or TextOutA) they show up as code page 437 except for {1...6 16 21...23 25} which are box-drawing characters and {0 9 10 13 28...31} which are empty or possibly control (but TextOut doesn't interpret control codes).

Does anyone know why this is? Are the first 31 characters of 1252 dual-purpose as in 437, are these characters of a different code page (handled specially by TextOut) or something else? — Preceding unsigned comment added by 82.139.87.39 (talk) 08:44, 1 October 2011 (UTC)

That's probably a Windows API thing. I don't think that Microsoft would have made the "MS LineDraw" font if they had incorporated CP437 specials into Windows-1252... AnonMoos (talk) 12:08, 2 October 2011 (UTC)
Even though MS LineDraw contains a subset of 437, it is still a superset of the characters talked about above by a far margin. — Preceding unsigned comment added by 82.139.87.39 (talk) 23:44, 27 January 2012 (UTC)

Windows-1252.svg legend[edit]

The legend for Windows-1252.svg only mention the blue content. The reader can't be sure about the green and red content ; also the black one but that should really be unchanged content. So I'm assuming the red content is removed and green added, from ANSI to Windows-1252. Even if my assumption is right, not everyone could make the same (assumption) thus a complete legend should be made. — Preceding unsigned comment added by DynV (talkcontribs) 20:20, 30 July 2012 (UTC)

"Microsoft-affiliated bloggers"?[edit]

"Details" section, last sentence: Microsoft-affiliated bloggers now state that “The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community.”

"Microsoft-affiliated bloggers"? What's that supposed to mean? — Preceding unsigned comment added by Tharos (talkcontribs) 12:55, 20 August 2013 (UTC)

Well a quick goole finds a MSDN blog ( http://blogs.msdn.com/b/oldnewthing/archive/2004/05/31/144893.aspx ) which links to a document ( http://download.microsoft.com/download/5/6/8/56803da0-e4a0-4796-a62c-ca920b73bb17/21-Unicode_WinXP.pdf ) supposedly from "Cathy Wissink Program Manager, Windows Globalization Microsoft Corporation" which contains that sentance. Presumablly this is what was reffered to. Plugwash (talk) 13:40, 20 August 2013 (UTC)

Time of origin[edit]

I've tried to figure out when 1252 originated. The closest I've come is a table from the Windows 1.0 Programmer's Reference (cited by Charles Petzold in Programming Windows, the old version, not the new one which isn't really about Windows proper any more) that shows what later would become 1252, although many characters are still missing.

1) The × and ÷ symbols weren't added yet, though they did exist in 8859-1.

2) Charles Petzold says the NBSP and SHY didn't exist yet, but I think he was mistaken. Their symbols are present in the table and it'd be silly to assume them to be duplicates of SP and -.

3) DEL is present in 1252 but not in 8859-1.

4) All characters in the ranges 0x, 1x, 8x, 9x were classified as unsupported. Clearly Windows 1.0 did have CR and LF and such. I can only assume the author of the table didn't consider such control bytes part of the character encoding proper. In any case, many characters that are present in 1252 now are still missing. These characters aren't present in 8859-1.

Since Windows 1.0 came out in 1987, I think this has important implications for the origin of the ANSI misnomer. Back then, 1252 pretty much was 8859-1 apart from the control codes and two symbols which were added later anyway. I think this set was already referred to as ANSI before any talk of Unicode in 1990 or so in order to differentiate it from the OEM (=IBM) set.

So when the name ANSI for 1252 originated, it was probably correct. — Preceding unsigned comment added by 82.139.81.0 (talk) 18:10, 27 May 2014 (UTC)

There definitely were characters 91h and 92h in Windows 1.0. 178.49.152.66 (talk) 21:06, 2 January 2015 (UTC)
No there weren't. Windows 1.0 didn't support any of the characters in that range and neither did it support × and ÷. NBSP was supported, SHY behaved like a non-breaking hyphen in Write whereas Notepad didn't wrap hyphens at all. — Preceding unsigned comment added by 82.139.82.82 (talk) 04:27, 4 September 2015 (UTC)
The symbols × and ÷ are both precisely at positions which differ in DEC's Multinational Character Set, even though the positions around them are identical. The only explanation is that for some reason the Œ and œ were taken out, this was the draft that the Windows code page was based on and × and ÷ were added later. This proves that when Windows 1.0 came out, two years before 8859-1 was published, the term ANSI for the Windows code page was correct. — Preceding unsigned comment added by 82.139.82.82 (talk) 20:06, 4 September 2015 (UTC)
By the way, ECMA-94 was published more than half a year after Windows 1.0 RTM. — Preceding unsigned comment added by 82.139.82.82 (talk) 22:18, 4 September 2015 (UTC)
I just tested a copy (using pcjs.org) and it does not exist. - Yuhong (talk) 23:45, 9 February 2017 (UTC)

The replacement of Œ and œ with × and ÷ was an official standards body deliberation melodrama, which had nothing to do with Microsoft. Some French representatives to the standards committee actually insisted at almost the last moment that Œ and œ should be dropped. (Of course, later on the French tended to come down on the other side, one of the factors leading to ISO/IEC_8859-15...) AnonMoos (talk) 08:08, 11 February 2017 (UTC)

Errors[edit]

Lets pick this paragraph apart:

Historically, the phrase "ANSI Code Page" (ACP) is used in Windows to refer to various code pages considered as native. The intention was that most of these would be ANSI standards such as ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft-affiliated bloggers now state that "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."[2]

Historically, the phrase "ANSI Code Page" (ACP) is used in Windows to refer to various code pages considered as native.

Native in what sense? To the machine? No, that would be the OEM code page. ANSI is used to refer to the Windows code page, and is still unofficially used in this manner. So much for historically.

The intention was that most of these would be ANSI standards such as ISO-8859-1.

Is that so? That doesn't follow from the reference cited, and in any case, given the character table in the Windows 1.0 reference, it seems that when the ANSI name was given, it was correct, which is a much simpler explanation.

Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard.

I don't think this is true. What we now know as code page 1252 has had many characters added over the years. When it was the first it still was as good as identical to ISO-8859-1, and when it was the most popular it wasn't.

Microsoft-affiliated bloggers now state that "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."[2]

Microsoft-affiliated bloggers is a bit of weasel wording. Who? (Answer: Cathy Wissink, who is probably paraphrasing other people who are left anonymous.) Why is their opinion authoritative? The reference [2] doesn't go to the original page and it contains an error: ISO-8859-1 does not reserve space for control codes, it simply leaves some characters undefined. (So you can use them for control codes, but also for different purposes. I think people at the time didn't consider control codes part of the encoding. I wonder why DEL was included in the original Windows 1.0 character set though. An oversight? Or maybe it wasn't considered a real control code?) Later 1252 formalised the control codes and added characters.

It is tempting to think that the ANSI draft Cathy Wissink mentions was identical to the final standard except that it didn't contain × and ÷, but we cannot be sure of that. — Preceding unsigned comment added by 82.139.81.0 (talk) 16:55, 28 May 2014 (UTC)

Character table[edit]

Legend: yellow cells are control characters, blue cells are punctuation, purple cells are numbers, green cells are ASCII letters, and tan cells are international letters. Differences from ISO-8859-1 are marked with thick green borders.

‘Differences from ISO-8859-1’: these are as far as I can tell all extensions to ISO-8859-1. The wording seems to imply to me that ISO-8859-1 does something else with that space, but it doesn't.

Also, the florin sign is tan, but it is a letter like currency symbol like the euro, centime, yen, pound and dollar signs.

Some colours aren't mentioned in the legend, and the legend isn't formatted like a proper legend with coloured boxes. And there are too many colours anyway, it would be better to simply outline the ASCII portion and use the same colours throughout. — Preceding unsigned comment added by 82.139.81.0 (talk) 17:06, 28 May 2014 (UTC)

The blurb[edit]

Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages. In LaTeX packages, it is referred to as "ansinew".

The blurb should be a short introduction and / or summary of the topic. Yet at once it doesn't properly fully summarise what code page 1252 is, and contains a very specific detail about Latex that no one cares about.

A lot is missing, but there are also some more specific issues:

Windows-1252

Not the original name. Insofar as it originally had a name it would have been called ‘code page 1252’.

CP-1252

What does CP mean? (I know, but does the average reader?)

of the Latin alphabet

And control codes, digits, punctuation and a swathe of other symbols.

in the legacy components

Actually those components themselves aren't legacy in any way, although they contain legacy support for code pages. The legacy moniker applies to the software that uses these components.

some other

By which we apparently mean the vast majority.

It is one version within the group of Windows code pages.

o_O ... What a terrible sentence. I don't know what it's trying to say beyond ‘it's a code page’ but it doesn't come across. — Preceding unsigned comment added by 82.139.81.0 (talk) 17:26, 28 May 2014 (UTC)


First off, what you're talking about is usually called the "lead paragraph" or "lead section", not the "blurb"...
And "Windows-1252" is the official IANA registry and MIME name. Wikipedia articles go by common name, not "original name". Much of the rest of your comments appear to be semantic quibbling. This article is not the place to explain what a character set is, but if you already know what a character set is, then the first paragraph should not be too mystifying. The last sentence is not really about Latex, but explaining the name "ansinew". If there were a place in the article devoted to discussing alternative names for the character set, then the sentence could go there. AnonMoos (talk) 07:51, 29 May 2014 (UTC)

I wouldn't agree that the IANA name necessarily equates the common name, and you completely ignored the fact that the original name must have some importance but isn't mentioned at all.

And you cannot read my comments and honestly say that it's ‘just semantic quibbling’. I have pointed out many real errors and flaws in the writing. I give you constructive criticism, and what do you do? Bite. For shame.

The name ‘ansinew’ has no significance outside Latex circles, and those circles being as marginal as they are, as such it is more of a property of Latex than of code page 1252, and it hence doesn't belong in the intro.

In general, whatever your opinions may be, the article must be able to be understood by a layman, and then not leave a wrong impression. I contend that as is, the article doesn't measure up to that standard. It is inaccurate, incomplete and ill structured. -- 15:13, 30 May 2014‎ 82.139.81.0


I moved the LaTeX note from the lede to the body, since it does appear to be a detail that belongs more in the technical specifics of the code page than in the more general lede summary. — Loadmaster (talk) 20:41, 30 May 2014 (UTC)