Jump to content

Talk:Code page 437

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 82.139.87.39 (talk) at 01:02, 28 January 2012 (→‎windows 1253 vs codepage437). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Glyph origins

In addition to characters lifted from the Wang word processing set, some of the glyphs may have originated from work Gates did with Microsoft BASIC for Commodore International ... but I can't find any full character maps of all PETSCII glyphs including 0-31 ... http://americanhistory.si.edu/collections/comphist/gates.htm

In terms of Commodore PET, they started with us from the very beginning. Because we helped Chuck Pedal, who was at Commodore at that time, really think about the design of the machine. Adding lots of fun characters to the character set, things like smiley faces, and suit symbols.
Hobart 19:20, 21 September 2006 (UTC)[reply]

Isn't this code page also known as PC-8?

Added with reference. —Coroboy (talk) 00:57, 15 November 2011 (UTC)[reply]

null

null character should be empty -- the ibm PC did not say "NULL" when the 0 byte was put into the frame buffer. the null, space, and blank characters (0x0, 0x20, and 0xff) were indistinguishable visually.

I agree - so I changed it. -- 212.63.43.180 (talk) 21:12, 24 January 2008 (UTC)[reply]
However, characters 0, 32, and 255 were used differently in IBM PC files. The way the table has been, the NULL, SP, and NBSP texts link to relevant articles where people can read what the specific functions of those characters were -- whereas leaving things blank conveys no information. AnonMoos (talk) 22:33, 24 January 2008 (UTC)[reply]
I solved the issue by putting two sets of the table for values 0-31. Ricardo Cancho Niemietz (talk) 09:46, 29 January 2008 (UTC)[reply]
Most of the changes were good, but now the English does need some clean-up... AnonMoos (talk) 11:24, 29 January 2008 (UTC)[reply]
Sorry, I'm not a native english speaker (I'm from and live in Spain). Please, help requested. Ricardo Cancho Niemietz (talk) 12:54, 29 January 2008 (UTC)[reply]

Multiple Bases

While the table header rows and Unicode are in hexadecimal, the CP437 is in decimal. This decreases the obviousness where the two encodings point to the same character. I would recommend making it entirely hexadecimal.

The overloaded character number 237 in CP437

The character for place 237 in the CP437 table should change from U+03D6 GREEK SMALL LETTER PHI to U+03D5 GREEK PHI SYMBOL.

In CP437, this position was used as U+03D5 GREEK PHI SYMBOL in italics, U+2205 EMPTY SET, U+2300 DIAMETER SIGN and even as a surrogate for U+00F8 LATIN SMALL LETTER O WITH STROKE, but rarely as U+03D6 GREEK SMALL LETTER PHI due to its IBM original shape (it seems merely a circle with a slash) does not ressembles closely this greek lowercase letter.

Also, the character 238 effectively should be changed to U+2208 ELEMENT OF. In addition to be used as U+03B5 GREEK SMALL LETTER EPSILON, in some dot matrix ticket printers is used today as the U+20AC EURO SIGN, in the european countries where the euro is the official currency.

In other hand, the character 236 is the U+221E INFINITY, not a greek letter at all, so you should change its background colour to grey.

As you can see, characters 236 to 253 in CP437 was primary intended all for maths symbols, so the positions 237 and 238 are not "real" greek letters. In despite of that, many people has used these characters as greek letters (to name angles and so on), of course.

And another issue: the character 235, U+03B4 GREEK SMALL LETTER DELTA was also used as U+00F0 LATIN SMALL LETTER ETH, an icelandic latin character.

A popular math software for MS-DOS in the late 80's, "Derive", employs the full CP437 character set to display complex formulae, with very good results.

People is able to do incredible things with a very little means...

Yours Ricardo Cancho Niemietz (talk) 15:53, 25 January 2008 (UTC)[reply]

I did the changes myself! :-D Ricardo Cancho Niemietz (talk) 14:00, 28 January 2008 (UTC)[reply]

Codes for 16 and 17

I just added the image to the top, which is a printout of the code page in order using QEMU. I noticed a discrepancy - positions 16 and 17 are swapped around, relative to the codes given in this article. Note that in the image, there is a right arrow then a left arrow (in the top row). In the table in the article, there is first the left arrow (U+25C4), then the right arrow (U+25BA). Is there an error in the article? I can't find a source for the first 32 characters. EatMyShortz (talk) 12:48, 18 February 2009 (UTC)[reply]

There are sources on the Microsoft site, the Unicode.org site, or if you insist on paper, you can look at Appendix C of The New Peter Norton Programmer's Guide to the IBM PC & PS/2 by Peter Norton and Richard Wilton (Microsoft Press, 1987 ISBN 1-55615-131-4). From what I can see if you place 16 and 17 side-by-side, they point at each other, as is also the case for 26 and 27... AnonMoos (talk) 09:33, 19 February 2009 (UTC)[reply]

The characters at 0x10 and 0x11

Consider:
U+25BA : BLACK RIGHT-POINTING POINTER
U+25B6 : BLACK RIGHT-POINTING TRIANGLE
U+25C4 : BLACK LEFT-POINTING POINTER
U+25C0 : BLACK LEFT-POINTING TRIANGLE

Compare with the characters at 0x1E and 0x1F:
U+25B2 : BLACK UP-POINTING TRIANGLE
U+25BC : BLACK DOWN-POINTING TRIANGLE

I would recommend to replace 25BA with 25B6 and replace 25C4 with 25C0 in the table.

The Terminus font (which is designed to include all CP437 characters) does not include 25BA and 25C4, but it includes 25B6 and 25C0. (Reference: [1])

I won't make this edit, because I'm uncomfortable with non-ASCII characters in firefox's text input box.
-- 'x' 92.225.64.211 (talk) 07:56, 25 May 2009 (UTC)[reply]

Note that the following characters render properly in IE7, which has an incomplete graphic rendering character set:
U+25B2 ▲ Triangle up
U+25BA ► Triangle right
U+25BC ▼ Triangle down
U+25C4 ◄ Triangle left
The characters U+25B2 (▂), U+25B6 (▆), and U+25C0 (◀) are not rendered properly by IE7, being displayed as empty squares.
— Loadmaster (talk) 17:18, 26 May 2009 (UTC)[reply]

The decision is not really up to us -- the standard equivalences recognized by Unicode are at http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT and http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/IBMGRAPH.TXT ... AnonMoos (talk) 18:39, 26 May 2009 (UTC)[reply]

Thanks for the links. I'm not sure how "standard" IBMGRAPH.TXT really is.
Comparing the chart http://www.unicode.org/charts/PDF/U25A0.pdf with the rendering by an IBM PC, I believe the author of IBMGRAPH.TXT has made a mistake. I won't write to him though, since I'm perfectly happy with unicode.org hosting a suboptimal document, and wikipedia perpetuating its content.
-- 'x' 85.179.155.203 (talk) 05:59, 2 June 2009 (UTC)[reply]
I'd be more comfortable with making clear where the suggestion comes from and clarifying that this information is provided principally from vendors and is not part of the Unicode standard nor referenced for historical accuracy (or indeed for current practice). "The standard equivalences recognized by Unicode" definitely overstates the case. -- Elphion (talk) 16:42, 5 April 2010 (UTC)[reply]
The mappings on that site aren't even consistent. IIRC, they have at least 2 versions of the Macintosh encoding that don't agree on ¤ vs. €. DanBishop (talk) 20:12, 26 June 2011 (UTC)[reply]

ascii art

Would be nice to have a reference to the fact this code page is used often for ascii art —Preceding unsigned comment added by 85.146.181.17 (talk) 01:15, 1 July 2009 (UTC)[reply]

It was actually mainly BBS-type text graphics with ANSI.SYS terminal codes (or other control of video colors), which isn't quite the same thing as plain ASCII art... AnonMoos (talk) 17:43, 1 July 2009 (UTC)[reply]
Also, many DOS applications made extensive use of these characters to represent menus and other GUI-like elements and rely on them to display properly. sPAzzMatiC 18:22, 6 August 2009 (UTC) —Preceding unsigned comment added by Spazzmatic (talkcontribs)

ess-zet/beta

The article mentions: * It has umlauts for German (Ä, ä, Ö, ö, Ü, ü), but sharp S (ß) must be represented with the beta symbol (β). To me, it looks like it should be the other way around. The beta-symbol is not in CP437 and a german sharp S shall be used. Thoughts? —Preceding unsigned comment added by Nlhenk (talkcontribs) 11:51, 9 March 2010 (UTC)[reply]

The character is "overloaded" with multiple meanings, but the fact that it's found between alpha and gamma is a pretty good indication that it was originally intended as a beta... -- AnonMoos (talk) 13:29, 5 April 2010 (UTC)[reply]
The order doesn't really mean very much; in the upper set the order was pretty haphazard (e.g., delta and epsilon much farther down the list). The "Greek" section doesn't conform to the usual order anyway, and the characters were chosen primarily for non-Greek uses. The beta/eszett was almost certainly intended for double duty from the beginning, as were several other characters. -- Elphion (talk) 15:38, 5 April 2010 (UTC)[reply]
FWIW, RFC 1345 defines character 0xE1 as β. The Windows MultiByteToWideChar function uses ß, while WideCharToMultiByte accepts both. DanBishop (talk) 20:19, 11 July 2010 (UTC)[reply]

Is the image really EGA?

The image purports to show the characters in an EGA display. But the character size in the image is 9 × 16 pixels, the standard VGA size. Isn't EGA limited to 8 × 14? -- Elphion (talk) 00:28, 5 April 2010 (UTC)[reply]

Of course, you are right. EGA had not 16 dots high glyphs, and had not 9 dots width mode. File:Codepage-437.png is VGA. Incnis Mrsi (talk) 11:33, 5 April 2010 (UTC)[reply]

I like having this screenshot; would it be possible to make one for the other DOS codepages? DanBishop (talk) 01:51, 27 June 2011 (UTC)[reply]

Like this? I could provide you a simple C program, which make such BMPs (unfortunately, not PNGs) from PSF raster fonts. Incnis Mrsi (talk) 14:17, 27 June 2011 (UTC)[reply]
Yes, like that. DanBishop (talk) 23:29, 27 June 2011 (UTC)[reply]

Entry on keyboards

The section "Entry on keyboards" says that "programs that support only Windows-1252" attempt to transliterate the "Greek" characters when entered via keyboard. Is there some support for this? (I've never seen this behavior -- what I get is always the corresponding Windows-1252 character.) -- Elphion (talk) 16:30, 5 April 2010 (UTC)[reply]

Hearing no response, I've removed the statement from the article. -- Elphion (talk) 01:05, 28 June 2010 (UTC)[reply]

The character at 0xE1

The table shows the character at position 0xE1 equating to U+00DF (LATIN SMALL LETTER SHARP S) which contradicts the text: "Table rows 14 and 15 (E and F), codes 224 to 255 (E0 to FF) are devoted to mathematical symbols, where the first twelve are a selection of Greek letters commonly used in physics." — Ksn (talk) 04:07, 20 November 2010 (UTC)[reply]

It's actually ambiguous or "overloaded" (see section "Multiple-meaning character glyphs"). AnonMoos (talk) 11:50, 20 November 2010 (UTC)[reply]

tan is not pink

i changed the text "and tan cells are international letters." to "and pink cells are international letters.", since the cells in question are pink. —Preceding unsigned comment added by 72.91.177.153 (talk) 00:45, 26 November 2010 (UTC)[reply]

windows 1253 vs codepage 437

windows operating system requires me to save under unicode when typing a text using 437codepage. do you have to choose unicode coding to save text using chr codes from the codepage437 table, too? Paul188.25.109.227 (talk) 15:44, 29 April 2011 (UTC)[reply]

mean is when to enter a chr-code from the table and saving isn’t codepage437 to not use unicode ? —Preceding unsigned comment added by 188.25.109.227 (talk) 15:47, 29 April 2011 (UTC)[reply]
It's not clear what you're asking. When you save a document, typically what is saved are the numerical codes of the characters. But the interpretation of those codes depends on the active code page. If a document created with one code page is displayed while another code page is active, many of the codes with the high bit set ("upper ASCII") will display as different characters. To get around that, you can save the text as Unicode (as WP does), so that each character is saved with its more or less unique code, and is therefore more or less unambiguous. I say "more or less" because some characters, even in Unicode, have multiple uses, and some may differ significantly in appearance from font to font. -- Elphion (talk) 04:57, 30 April 2011 (UTC)[reply]
Saying "more or less" is extremely misleading. Unicode characters are (apart from private use characters) unique. A character is not the same thing as a glyph. The letter A will look different in different fonts, but that doesn't mean it's ambiguous. Even the CJK-unification opponents' views stem from this basic misunderstanding (often combined with a dose of rabid paranoid nationalism).
When you care about the exact look of a character, save an image or specify the font. Otherwise, for all intents and purposes, Unicode is unambiguous.
Saying "Unicode characters are unique" or "Unicode is unambiguous" is also misleading. The same character may have many different "meanings" -- the code charts give several readings for some characters. The problem is that in some user communities, the different readings are conventionally represented by different forms of the glyph, so the boundary between glyph and character is not necessarily clear cut. Ideally, Unicode would provide different codepoints for the different readings, and in some cases it has; but in many it has not. The solution of using an image is stop-gap at best; as a long-term solution it's a non-starter. -- Elphion (talk) 01:00, 28 November 2011 (UTC)[reply]
When on Windows you hold down Alt and type 3 digits that are the code value in 437, it is immediately translated by Windows into the UTF-16/UCS-2 value. The fact that the number was in CP437 is lost at that point. Most applications (such as MS Word) will place the character into the file in UTF-16 or UTF-8 encoding. You usually have to do something special to make a file that is CP437 encoded nowadays.Spitzak (talk) 16:28, 1 May 2011 (UTC)[reply]
I think something slightly different is going on. The keyboard module converts the key sequence Ctrl + Alt + decimal code into a code in the range 0..255 and presents that to the active application, which can interpret it as it sees fit (testing the active codepage if it wants). For example, with the default code page (437), Notepad converts the character code to the equivalent Unicode value (and prompts you to save the doc in a Unicode format to avoid losing character information), while Command Prompt simply displays the character in the active code page and stores the code on redirection as the untraslated code. Thus if I execute "echo £>junk.txt" in Command Prompt (entering £ as Ctrl + Alt + 156) and open junk.txt in Notepad, what I see depends on the mode I use to open the file in Notepad: if ANSI mode, I see œ, which is the ANSI character with code 156. -- Elphion (talk) 19:17, 1 May 2011 (UTC)[reply]
When you hold alt and press some digits on the numeric keypad, the following messages are sent to the application:
One or more WM_SYSKEYDOWN(VK_MENU, left-alt, alt pressed)
For every digit one or more WM_SYSKEYDOWN(VK_INSERT...VK_PRIOR* or VK_NUMPAD0...VK_NUMPAD9, numpad-0...numpad-9*, alt pressed)
 * Note: not contiguous.
Followed by one corresponding WM_SYSKEYUP message.
This then results in WM_CHAR(Unicode codepoint, left-alt, alt not pressed)
If the number entered starts with one or more zeroes, the active code page will be used, otherwise the OEM code page will be used. If the number is not in range 0...255 edit: masked with 255 (FFh).
Most applications won't bother with the WM_SYSKEY* messages and only look at the WM_CHAR message. Also, this assumes the window is a Unicode window*, which nowadays is almost always the case even when talking about Notepad.
 * If the window is an ‘ANSI’ window (which isn't really ANSI) the character is translated to the best match in the active code page. For example if you entered a box drawing character that isn't present in the active code you'll often get a +. In a SBCS you'll get one WM_CHAR, in a MBCS you'll get one or more.
I'm pretty sure that you won't get the WM_CHAR unless you pass the WM_SYS* messages to the default window procedure.
I hadn't realized leading 0s made a difference (which explains a lot of puzzlement over the years, thanks!) So yes: a leading 0 gives you some approximation to the character with that code in the active code page, otherwise some approximation to the character with that code from the OEM set; and the form of the approximation (Extended ASCII code or Unicode codepoint) depends on the mode of the window. So the upshot is that a default translation is performed automatically by Windows in processing keyboard messages, depending on the active code page and the presence or absence of leading 0s; and the code presented in WM_CHAR will depend on the mode of the window. The app is of course still free to interpret the WM_CHAR or the other kbd messages if it doesn't want the default translation.
For control codes 1..31, the "OEM translation" of code with or without 0 is the control code itself, as is the "Unicode translation" of code with 0 (which Notepad mostly ignores except for \n, \r, \t, \b). But the "Unicode translation" of code without 0 is the Unicode codepoint of the "equivalent" graphical character (smiley face, eighth note, etc.). (At least, that's what Notepad seems to do.)
On my system, values out of range 0..255 (with or without leading 0s) appear to get masked with 0xFF, so, e.g., Alt+254, Alt+510 give the same result.
-- Elphion (talk) 00:45, 28 November 2011 (UTC)[reply]
Exactly. Keep in mind that in practice ‘some approximation’ will turn out to mean ‘exact match’. See this summary:
Ansi to Ansi Window: exact match
Ansi/OEM to Unicode Window: exact match, since when Unicode was drawn up the existing code pages were integrated in it. However:
- Some Chinese glyphs were added to their codepages before they were/are added to Unicode. These will not be entered through Alt-codes however, you will use an IME (I don't even know if any multiple-byte characters can be entered with Alt-codes at all). These glyphs will map to PUA codepoints. Fonts which support both Unicode and that Chinese code page will have the same glyphs at those PUA locations as at the corresponding code page locations.
- Some code pages (notably 437) are actually two-in-one. A graphical code page and a partly semantic code page. Since nowadays semantic information tends to be stored at a different conceptual level in the file format (in HTML for example one would use <table> &c. instead of the tab character) the character data is considered to be just text so nowadays the spotlight will mostly fall on the graphical code page.
- Some code pages contain characters which are invalid/unmapped.
OEM to Ansi window: this can be lossy, for example if the OEM code page is Greek, but the Ansi code page is Hebrew. But since Ansi windows generally belong to legacy applications, this generally won't be an issue.
Notepad is a wrapper around the edit control. Nowadays it's a Unicode application, and it is capable of storing Unicode files, which means it is now possible for a text file to contain both versions of 437 character 13: ♪ and CR. (Back in the day, this was impossible, since there'd be no way to distinguish them. You'd either interpret all 13s like CR, like Edit, or you'd draw all of them as ♪, like the graphics card.)
- When you enter a character in code page 437, the graphical representation will be used, as noted above. On a lot of computers 437 is the OEM code page.
- Other code pages may not have graphical glyphs in the 1...31 range. The edit control interprets some control codes specially and ignores/swallows the rest (to the best of my knowledge).
@masked: my mistake, I fixed it.

Why 437?

Why is it called 437? — Preceding unsigned comment added by 82.139.87.39 (talk) 08:32, 1 October 2011 (UTC)[reply]

IBM picked the number. See here: [2]. The CP numbers are dispensed neither consecutively nor monotonically (CP00425 in the year 2000, CP00437 in 1984, CP00500 in 1986). I guess we'll never know the etymology of "437" unless we ask the person who was in charge of CP numbers in 1984. 85.178.182.94 (talk) 23:43, 7 December 2011 (UTC)[reply]
Any idea how we could go about finding the ‘culprit’? ;-) — Preceding unsigned comment added by 82.139.87.39 (talk) 23:47, 27 January 2012 (UTC)[reply]