|This is the talk page for discussing improvements to the EBCDIC article.|
|This article is of interest to the following WikiProjects:|
- 1 EBCD
- 2 Addition Request
- 3 Query
- 4 Support
- 5 Pronunciation?
- 6 Relation to Hollerith Code
- 7 Usage of EBCDIC
- 8 Redundant
- 9 EBCDIC niceties?
- 10 Sort Merge EBCDIC Order versus ASCII
- 11 IBM ASCII support citation request
- 12 Spreadsheet format
- 13 Tables don't correspond
- 14 EBCDIC vs. ASCII
- 15 Control codes
- 16 Criticism and humor
- No, the 2741 used a 6-bit character code (plus shift-up and shift-down control codes). The communications controller (an IBM 270x-series device) interpreted the up/down shift codes and converted them into a seventh bit to store in memory, and generated up-down shift codes from the seventh bit on output. This seven-bit code was generally converted to/from EBCDIC by software. The 6-bit code was an encoding of the Selectric's typeball tilt amount (2 bits) and rotation (4 bits). In models where an ordinary office typeball was used, the resulting character code was called "correspondence code". Other models used a code closer to computational BCD, achieved mostly by using a typeball with the characters arranged differently. (This latter form required more logic circuitry to translate keyboard inputs appropriately.) 126.96.36.199 (talk) 18:07, 28 November 2007 (UTC)
The ISO/IEC 8859 article has a nice table showing the various parts. It would be nice to have a similar table showing the EBCDIC variants. One could then see at a glance where they were the same and where they were different. Such a table should have (at a minimum) CCSIDs 037, 285, and 500.
- Agreed. The most common code page used in the U.S. was 037, but this has been replaced in recent years by 1047 (at least on S/390 systems running Linux). — Loadmaster 23:28, 13 November 2006 (UTC)
It seems that the code page table was wrong. The row 8 with the letters a-i was be moved one column to the left, that is a=81, b=82 and so on. —Preceding unsigned comment added by 188.8.131.52 (talk) 16:53, 17 December 2007 (UTC) , fixed. JoeBackward (talk) 03:31, 9 January 2008 (UTC)
I would guess that the word support as in the "computer supports EBCDIC" was originally marketspeak. It implies that the use of EBCDIC is disirable option instead of a requirement. --Gbleem 22:15, 31 August 2006 (UTC)
- The IBM S/360 had an "ASCII/EBCDIC" bit in the program status word "register", supposedly to control what zone nibbles were created in zone decimal conversion opcodes. I think the theory was that instead of generating "F0 F0" (which is EBCDIC "00"), it would generate "30 30" (which is ASCII "00"). This control bit was removed in later versions of the hardware. — Loadmaster 23:26, 13 November 2006 (UTC)
- Actually, the S/360 would generate zone values of "50 50" for ASCII zeroes, because IBM assumed (was hoping) that the industry would accept extending 7-bit ASCII to 8 bits by shifting the first 3 bits to the left and inserting a duplicate of the high-order bit into the fourth bit position (from the left). The logic of packed decimal arithmetic in some instructions, such "Edit" depended on the notion that the "sign digit" would have a value that fell beyond the 0-9 range. (A full discussion of this might be interesting content for Wikipedia, but not in the EBCDIC article.) In the end, IBM apparently decided that the best way to support ASCII was to use EBCDIC internally and then convert character data to ASCII by use of the TR (Translate) instruction. The bit in the PSW (Program Status Word) assigned to specifying "ASCII" mode was re-assigned in S/370 to control "Extended Control" mode. This was safe because IBM never created an operating system that set the ASCII bit to 1, and setting the bit could only be done by privileged (i.e. OS) code. -- RPH 12:29, 27 June 2007 (UTC)
- Correction: Actually IBM's concept of an 8-bit version of ASCII (or USASCII, as it was known later in the life of the System/360) was more complex, as described in the System/360 Principles of Operation. IBM had proposed an 8-bit extension of (US)ASCII by applying a mapping transform in which the three high-order bits of the byte were taken from the first two bits of the ASCII character code, followed by the high-order bit, repeated. This had the effect of "stretching" the ASCII code points across the 0-255 range. For example, the numeric values would mapped from hex 50-59 instead of 30-39. IBM apparently hoped that this arrangement would be accepted by the committee, because it would avoid architectural problems with the implementation of packed-decimal instructions. For example, the "Edit" (ED) and "Edit-and-Mark" (EDMK) instructions used character values of 20 and 21 as "digit select" and "significance start" characters, but that wouldn't work properly if the space were still mapped to the hex 20 code point. Under IBM's re-mapping, the value of a Space character would be hex 40 (the same as in EBCDIC). Since the standards committee never agreed to IBM's 8-bit mapping, IBM dropped the "ASCII-mode" bit in the Program Status Word in the following generation of processors, replacing the bit with one that indicated "extended control mode". ASCII would be supported by using the Translate instruction upon input and output. This information would be an interesting historical background for both EBCDIC and the System/360, although probably in a separate article RPH 20:40, 10 September 2007 (UTC) (reedit: RPH 21:30, 23 October 2007 (UTC)) There is a discussion of the ASCII-bit in Note 2 of the IBM System/360 Wikipedia article and its support of the proposed "Decimal ASCII" 80-column punched card that was rejected by the user community.
How is EBCDIC pronounced? Eb-ka-dic? --Dgies 18:10, 3 November 2006 (UTC)
- The jargon file gives "eb-see-dic" together with two less euphonic variants; but I'd really like an "official pronunciation" added into the article. --tyomitch 03:49, 7 November 2006 (UTC)
- Most mainframe programmers I've heard (in the U.S.) pronounce it "eb'-se-dik". — Loadmaster 23:23, 13 November 2006 (UTC)
- I've heard "eb-see-dic" a lot too. --Memming (talk) 21:27, 5 May 2008 (UTC)
- Yes, "eb-see-dic" or "eb-sa-dic" is what most people say. 184.108.40.206 (talk) 22:07, 17 May 2009 (UTC)
Relation to Hollerith Code
I have read that EBCDIC is a descendant of Hollerith Code (e.g. http://www.columbia.edu/acis/history/census-tabulator.html). Unless this is not accurate, it should be mentioned. (Even if it is false, that should be mentioned, since it is out there.) —überRegenbogen 12:26, 24 February 2007 (UTC)
- I added a mention of "Extended Hollerith" as the card-code that corresponds to EBCDIC in the S/360+ systems. Much more could be said on this topic, such as including a code-chart that demonstrates the logical nature of the mapping between EBCDIC and the extended card-code. Such a chart is found in the IBM S/360 Principles of Operation manual and many other publications from that period. Actually, such a chart, showing EBCDIC in its original form, would be more instructive than the somewhat disingenuous inclusion of one of the National Language Support extensions to EBCDIC, apparently to make the point that EBCDIC is a chaotic mess, even though extensions to ASCII for this purpose has had essentially the same effect on that code as well. -- RPH 12:42, 27 June 2007 (UTC)
Usage of EBCDIC
All IBM mainframe peripherals and operating systems (except Linux on zSeries or iSeries) use EBCDIC as their inherent encoding but software can translate to and from other encodings.
At exactly which places is EBCDIC used within IBM products? I can only think up EBCDIC being used as a text file encoding, but with Unicode even that usage is obsolete. --Abdull 09:51, 7 June 2007 (UTC)
- As far as I know, every IBM mainframe still uses EBCDIC as its primary character set. Which means that that every mainframe disk file (dataset), data storage tape, or CD contains text data in EBCDIC form. Can you name any IBM mainframe systems that actually use Unicode? — Loadmaster (talk) 19:19, 12 April 2008 (UTC)
Under the "Criticism and humor" section, isn't "Another popular complaint is that the EBCDIC alphabetic characters follow an archaic punch card encoding rather than a linear ordering like ASCII. " equivalent to the snippet from esr: "...such delights as non-contiguous letter sequences..."? --WayneMokane 22:36, 18 October 2007 (UTC)
The example given, "while in EBCDIC there is one bit which indicates upper or lower case", is not valid, since the same applies to ASCII-- the third most significant bit signifies lowercase. Anyone have a replacement, or is EBCDIC without niceties? --Luke-Jr (talk) 07:43, 23 March 2008 (UTC)
- Yes. In its original form, that is, as an Extension of Binary Coded Decimal Interchange Code, a six-bit IBM code, the addition of two high-order bits allowed the characters to be unfolded into four groups, or "quadrants", numbered 0 to 3. Quadrant 0 contained control codes, generally only used for terminals. Quadrant 1 contained the space and all "special" characters (punctuation marks and symbols). Quadrant 1 was for lower-case letters (rarely used in the 1960's, and not part of the original BCD code). Quadrant 3 contained the capital letters and numberic characters, in that order. Overall, this mapping allowed put the characters into a good sorting-order, while at the same time simplifying the logic circuits that the translation of the old 6-bit BCD into EBCDIC, into quadrants 1 and 3. Since BCDIC had no controls characters or lower-case letters, this was done by setting the leftmost bit according to whether the character was alpha or numeric, and the second bit was always 1. Early peripherals for the 360, such as the 1403 and 1443 printers, carried over from the previous generation of systems, worked without modification using the last 6 bit of the character code, although an extra-cost feature, called UCS for (Universal Character Set), permitted use of the full 8 bits, to support lower-case and other characters. The old 7-track tapes (6 bits plus parity) written on earlier systems, could be read and translated into EBCDIC by the tape control unit electronics. Most installations had mostly 9-track drives, and one or two 7-track drives for tape compatibility with the older IBM systems, which continued to be used alongside the 360's until they were phased out. So, EBCDIC was meant to be a transitional code, but conversion to ASCII turned out to be tougher than anticipated, made more difficult when IBM's proposal for an 8-bit mapping of ASCII was rejected by the standards committee, in which IBM had little voting power. This made the ASCII-mode bit in the 360's Program Status Word essentially useless. The ASCII-mode was dropped in the System/370, and the bit became the "extended control mode" bit, enabling the new 370 features, such as virtual memory. Since the bit was always set to 0 by older operating systems, it became a handy way of enabling compatible operation of 360-based operating system code. RPH (talk) 14:49, 12 April 2008 (UTC)
- The "nicety" of turning one bit on or off to change case in character codes is nice for building case-insensitive sort keys for straight keyboard text data sets: you OR an ASCII character value with $20 to make text chars all lowercase or OR an EBCDIC character value with $40 to make text chars all uppercase. ORing figure (numeral, digit) values with $20 in ASCII ($30-$39 for figures 0-9) or $40 in EBCDIC ($F0-$F9 for figures 0-9) does not change the values of the figure codes. EBCDIC and ASCII are about as easy to work with as far as uppercase, lowercase and figures are concerned, but EBCDIC by grouping the common keyboard punctuation in a "quadrant" of $40-$7F was slightly easier to work with. (Programming other than sort/merge, the splits in the EBCDIC A-I J-R and S-Z code assignments dictated by conformity to the Hollerith punch card code makes "bumping" through the alphabet by incrementing the char code value more complicated in EBCDIC compared to ASCII.) In both EBCDIC and ASCII, typesetting systems map additional characters (accented letter, small caps, old style figures, inferior figures, superior figures, symbols) into unused corners of the $00-$FF binary matrix. The RCA GSD extended EBCDIC for PAGE-1 and the Postscript font ASCII assignments (to name two I have had to work with) require a translate table as the most effecient way to build a case insensitive sort key. At that remap stage, neither EBCDIC nor ASCII are nice and both become "necessary evils". Naaman Brown (talk) 13:34, 15 May 2009 (UTC)
Sort Merge EBCDIC Order versus ASCII
- Sorting keyboarded text in EBCDIC you group (low to high):
- control code values $00-$3F,
- punctuation $40-$7F,
- lowercase letters $81-$A9,
- uppercase letters $C1-$E9,
- numbers $F0-$F9;
- sorting keyboarded text in ASCII you group (low to high):
- control code values $00-$1F,
- some punctuation $20-$2F,
- numbers $30-$39,
- some more punctuation $3A-$40,
- uppercase letters $41-$5A,
- some more punctuation $5B-$60,
- lowercase letters $61-$7A,
- some more punctuation $7B-$7F.
- The superiority of EBCDIC over ASCII in sort/merge applications should be obvious. And punch cards are immune to EMP. Naaman Brown (talk) 22:49, 14 May 2009 (UTC)
- Sorting keyboarded text in EBCDIC you group (low to high):
- Perhaps, but you should add a citation to back up your claims of "superiority". Otherwise it sounds too much like opinion. And besides, you're forgetting the punctuation and international characters codes interspersed among the English alphabetic characters, which vary from codepage to codepage. — Loadmaster (talk) 22:49, 19 May 2009 (UTC)
- See the heading Criticism and Humor. My comment "punch cards are immune to electromagnetic pulse" was meant to be a clue. Seriously though the sort order advantage -- common punctuation, lowercase, uppercase, digits -- is minor, but ASCII having common punctuation between uppercase and lowercase was sometimes annoying. In sorting composition files for various projects at Kingsport Press, I usually had to build translated sort fields whether sorting EBCDIC or ASCII files because of the different mappings for accented letters: they both gave me problems. It is like selecting which is your least favorite of slimy green vegetables. Naaman Brown (talk) 04:05, 18 September 2009 (UTC)
IBM ASCII support citation request
Can someone find a citation to the following paragraph?
Interestingly, IBM was a chief proponent of the ASCII standardization committee. However, IBM did not have time to prepare ASCII peripherals (such as card punch machines) to ship with its System/360 computers, so the company settled on EBCDIC at the time. The System/360 became wildly successful, and thus so did EBCDIC.
How about Wiki itself?
- The article on Bob Bemer mentions that he is commonly called the Father of ASCII, and he was an IBM employee at the time, and for the next quarter-century. And IBM sent him to the ASCII meetings, and paid his way, and allowed him to do his extensive work on ASCII during company time. —Preceding unsigned comment added by T-bonham (talk • contribs) 06:35, 12 March 2009 (UTC)
What confuses me is that it implies that EBCDIC was developed because IBM didn't have time to implement ASCII. If they had enough time to develop and implement EBCDIC, then surely they could have saved time by just implementing ASCII. Was it perhaps the case that IBM foresaw that the final ASCII spec would not be ready in time? — User:ACupOfCoffee@ 18:42, 16 November 2011 (UTC)
Many EBCDIC files were tables of data without separating characters. For example, every 100 chars a row ends and that is broken up into 20 '5 character' fields.
- It's not really a format, it's known as fixed length records, in which there is 0 or more records which are delimited by lines. The format itself is usually defined by some property in the system or through an accompanying file. In a lot of cases there is no file, what you know as a file is just an extract of part of a database and nothing about this phenomenon is specific to EBCDIC. Jeffz1 (talk) 08:44, 4 June 2009 (UTC)
Tables don't correspond
The table in the article is said to be "derived from" CCSID 500. The wording implies that all the characters shown are identical to CCSID 500, and the only difference is the omission of certain characters that aren't "basic English". In fact, though, the table differs from that given at EBCDIC 500 in several positions; for example, 4F, 5A, BA and BB (I don't know if that's a complete list). I think the reason for this, and the meaning of "derived from", should be explained better. 220.127.116.11 (talk) 20:32, 25 November 2009 (UTC).
EBCDIC vs. ASCII
Rather than start an edit war, I'm going to ask for comments.
I changed this:
EBCDIC has no modern technical advantage over ASCII-based code pages such as the ISO-8859 series or Unicode. There are some technical niceties in each, e.g., ASCII and EBCDIC both have one bit which indicates upper or lower case. But there are some aspects of EBCDIC which make it much less pleasant to work with than ASCII, such as a non-contiguous alphabet.
EBCDIC has no technical advantage or disadvantage compared to ASCII-based code pages such as the ISO-8859 series or Unicode. There are some technical niceties in each, e.g., ASCII and EBCDIC both have one bit which indicates upper or lower case. Unlike ASCII the EBCDIC alphabet is non-contiguous and is interleaved with some non alphabetic characters. Data portability is hindered by the fact that EBCDIC lower case alphabetic characters are lower in the collating sequence than upper case, and numerics are higher than both— the exact opposite of ASCII.
and had it reverted with the comment:
Nonsense, the non-continguous alphabet is obviously a problem. The order of case & numbers is trivial in comparison)
It seems to me that there are a lot of rabid anti-EBCDICians, but I believe there is no real difference between code pages containing the same characters as long as the alphabetic and numeric characters sort correctly. The contiguousity or non-contiguousity of the alphabetic characters is immaterial, since code usually uses something like "isalpha" rather than comparing for >='A' and <='Z'. On the other hand, the difference in the way data sorts - I should have said "data interchange" rather than "data portability" - causes a lot of problems porting data between ASCII-based and EBCDIC-based systems, since often programs expect data to be sorted in a particular order.
- I restored your edit because it changed the paragraph so that it presented both sides of the argument very well, and one of the core policies of Wikipedia is neutral point of view. In all my years working with EBCDIC and ASCII, I've never seen the non-contiguous alphabet as a problem, but I would be wrong to remove statements to the contrary from the article. Pages on tech subjects are gonna draw a lot of opinions, but I will oppose anyone who claims their opinion is the only "right" one. There is no such thing as "the one correct opinion"; such is a hallmark of fallacy. — UncleBubba ( T @ C ) 20:14, 3 June 2012 (UTC)
- It seems that someone has changed the article text again, moving around some text and--conveniently--removing the no-technical-advantage wording. I don't know about you folks, but I'm going to insist this be discussed here before the article text is changed, which is--after all--the way things are supposed to work on Wikipedia.
- Since the paragraph does not in any way anymore talk about any "technical advantage or disadvantage", except for the non-contiguous alphabet which is already mentioned 3 times in the "criticism and humor" section, it seemed best to remove the totally meaningless opening sentence and the only comparison (which goes against EBCDIC and thus conflicts with the "no disadvantage" statement). The paragraph did discuss two unrelated subjects: I18N extensions to EBCDIC, and the fact that sorting order causes more difficulty for interoperating with ASCII than the code point differences. My edit was an attempt to sort this out.
- I have to say that I find it pretty shocking and disgusting that comparing inanimate objects is considered NPOV. But if that is the case, it is best to say nothing.Spitzak (talk) 20:00, 13 June 2012 (UTC)
- Half (32) of the control codes have exact ASCII equivalents. The other half were mostly control codes for IBM 3270 displays and other IBM hardware devices. A few code points were not assigned at all. — Loadmaster (talk) 21:15, 25 September 2013 (UTC)
Criticism and humor
"programming a simple control loop to cycle through only the alphabetic characters is problematic." I think people are reaching for things to criticize EBCDIC for out of an anti-IBM bias. Sure you can't simply add one to 'I' and get 'J', but how "problematic" is it to code a table? No competent programmer would have any problems with this. Peter Flass (talk) 13:30, 19 September 2013 (UTC)
- I decided to rewrite this sentence. The sequence of EBCDIC characters was only problematic to programmers used to ASCII and the Cism 'I'++ == 'J'.. I hope my version is neutral. Peter Flass (talk) 14:11, 19 September 2013 (UTC)
- OK,I wrote this, but had it reverted:
Some programmers accustomed to ASCII were confused that adding one to the binary value of an EBCDIC character might produce an unexpected result
- OK,I wrote this, but had it reverted:
My feeling is that prior to ASCII most or all character sets didn't have contiguous alphabetics. BCD, the most widely-used encoding was similar to EBCDIC, so EBCDIC would not have been problematic. Comments? Peter Flass (talk) 12:29, 21 September 2013 (UTC)
- But for some reason the criticism that it was not standard is not suffixed with "Some programmers accustomed to ASCII were confused that this was not a standard", and that there were several versions is not suffixed with "Some programmers accustomed to ASCII were confused that there were several versions", so I see no reason to add this sentence either. It also reads like astroturfing (which is really weird as the argument is 50 years old and will have no impact on IBM's income today).Spitzak (talk) 15:15, 22 September 2013 (UTC)
- Reading above it looks like you claim it is not a problem because "everybody uses isalpha()". This is FALSE. In 1970 or so everybody did "c >= 'A' && c <= 'Z'", in fact it was a common problem that even lower-case letters failed. On the first C libraries isalpha() was a macro that did this (it was changed to a macro to avoid the double-evaluation of the argument, and only after much brow-beating as using 256 bytes for a lookup table was considered a horrible waste of memory, quite often versions failed for bytes with the high bit set as they only used 128 byte tables as saving 128 bytes was considered very very important). In any case you can't go fishing for excuses, both claiming old practice and modern practice somehow make a point when at the time of the argument it was brain-dead obvious to any programmer which encoding was superior.Spitzak (talk) 15:23, 22 September 2013 (UTC)
- Obviously we disagree ;-) I don't think calling something a problem because many programmers were sloppy or lazy is justified. In any case, my point is that ASCII and EBCDIC orignated in roughly the same time period after BCD had been in use for many years. No one going from BCD to EBCDIC would have been confused; it was only the programmers who had used ASCII first would see anything wrong with it. At any rate, I'll wait and see if anyone has anything to say on the talk page. Peter Flass (talk) 15:39, 22 September 2013 (UTC)
- Stating that programmers who used ASCII were confused about the non-linear arrangement of EBCDIC is a retrofitting of history. I would guess that far more programmers moved from EBCDIC to ASCII machines in the 1970s (if they moved away from IBM at all), than those who were accustomed to ASCII (or any of the half-dozen or so other character sets in use during the 1950-1960s) moved to EBCDIC machines. In any case, the idiomatic ⟨c >= 'A' && c <= 'Z'⟩ was an invention of C on ASCII machines. I doubt that any similar idiom existing on any EBCDIC machine at the time. (IIRC, the white book mentions this character set difference.) Obviously, S/360 programmers had some means of testing for alphabetic characters other than a single range comparison. — Loadmaster (talk) 21:46, 24 September 2013 (UTC)