Talk:ASCII/Archive 2

From Wikipedia, the free encyclopedia
Jump to: navigation, search

CP/M

Don't CP/M files end with Ctrl-Z (following DEC conventions), not Ctrl-C? AnonMoos 07:04, 31 May 2006 (UTC)

Err, I think you mean Ctrl-D, the default end-of-file input character, not Ctrl-C, the interrupt character. Like Ctrl-Z on a DEC OS, Ctrl-D was typed on a terminal to tell the keyboard driver to report end of file, but was not stored in files. Unix and most DEC OSes stored the length of a file in bytes, so there was no need for an in-band end-of-file indicator.
(In ASCII, Ctrl-D is "End-of-transmission character" and Ctrl-Z is "Substitute character". Apparently Dennis Ritchie and Ken Thompson followed the standard, but DEC went for the easy-to-remember character.)
There was one low-end DEC OS, RT-11, that stored the length of a file only as a number of blocks; RT-11 programs which read text files simply ignored any contiguous sequence of zero bytes at the end of the last block. CP/M was reputedly influenced by RT-11 (or at least that's what I heard from fellow RT-11 programmers when CP/M appeared) and stored file lengths as a number of 128-byte "sectors", so they might well have used Ctrl-Z as an in-band indicator for text files. Certainly, MS-DOS used Ctrl-Z as an in-file marker (see PNG#File_header for a clever trick based on this), and I believe this, like most of MS-DOS V1, came from CP/M.
But I've never used CP/M, so I'm unable to answer your main question. Cheers, CWC(talk) 13:55, 31 May 2006 (UTC), former RT-11 guru
It says Ctrl-C on the article page right now: ASCII#ASCII control characters ; however VAX/VMS (the one DEC operating system I've had experience with) didn't use Ctrl-D to mark end of text input from console, but rather Ctrl-Z -- and I have strong reason to suspect that CP/M did the same. AnonMoos 16:44, 31 May 2006 (UTC)
Sorry, I didn't explain myself well enough. Despite its meaning in ASCII, Ctrl-C is not used as end-of-file in either Unix or DEC OSes; instead it's the stop-what-you're-doing character in both (though Unix allows you to change which character does what). Ctrl-D as the default end-of-file character for Unix is "in the spirit of" ASCII. Ctrl-Z as EOF for all(?) the PDP-11 DEC OSes and VMS is totally unrelated to the ASCII meaning. (I always assumed they chose it because it's easy to remember that end-of-alphabet means end-of-file.) I think you're right that (1) CP/M used Ctrl-Z for end-of-file, and (2) they they got it from DEC (along with PIP).
The relevance of this is that most of the ASCII control characters are used in ways that have nothing to do with their ASCII definitions. Basically, they are hangovers from the teletype era. Computer people have kept inventing new meanings for "unused" control characters, which rarely had anything to do with the original meaning. Cheers, CWC(talk) 20:38, 31 May 2006 (UTC)

I'm 100% sure that DEC's OS/8 used ^Z in the file to represent the end of file, and 90-something % sure that the same was true for RT-11. I'm less sure about TOPS-10, TOPS-20, and RSTS/E, though I think they did, and haven't a clue about VMS and RSX-11. Jordan Brown 05:21, 11 September 2006 (UTC)

UNIX control characters

Originally UNIX (6th and 7th editions) used # as erase and @ as line kill (with ^D as EOF, DEL/RUBOUT as Interrupt and ^\ as Quit) - see unix/tty.h on sheet 79 in chapter 5 of "Lions' Commentary on UNIX 6th Edition with Source Code" ISBN 1-57398-013-7. This was because of the use of hard copy teletype terminals (I assume). Later on when glass ttys became popular the erase character was changed to ^H. Still later @ became a regular character (allowing it to be used for email addresses). BSD UNIX changed things to match the characters used by DEC on its VAX VMS operating system - DEL was reassigned to be erase (instead of ^H) and ^C replaced it for interrupt. TheGiantHogweed 07:02, 9 July 2006 (UTC)

Many years ago, when I first used Unix, I was very startled to find #=erase. I seem to recall that the way to input "#" was to press "\" and "#". (In other words, backslash was the terminal-driver-escape character. In modern Unix terms, LNEXT defaulted to \ instead of ^V.) Does anyone know for sure?
DEC had a fairly standard set of Control-key actions, but they were hard-coded. Eg., ^U=erase, ^C=interrupt(like Unix SIGINT). On TOPS-10 and TOPS-20, ^T (T for Tell?) produced a status report (CPU time used, etc). VMS had ^T and added ^Y (= Unix SIGQUIT). BSD (4.3 onwards) has a STATUS key, defaulting to ^T, which generates a SIGINFO.
I guess all the developer teams borrowed each other's better ideas. Does anyone know for sure? Cheers, CWC(talk) 10:45, 9 July 2006 (UTC)
Yes, Unix terminal drivers use "\" to escape the next character typed, so "\#" is required to enter a single "#" if that is the currently set delete character for the user's terminal. For historical ancestry, a lot of DEC TOPS-10 stuff was borrowed by CP/M, which was in turn borrowed by Microsoft DOS, including ^Z for end-of-file. — Loadmaster 16:24, 5 December 2006 (UTC)

\n is not always ASCII LF in C

The article Newline states :

The C programming language provides the escape sequences '\n' (newline) and '\r' (carriage return). However, contrary to popular belief, these are in fact not required to be equivalent to the ASCII LF and CR control characters.

But reading the table on the ASCII page it gives the suggestion that \n is always mapped to ASCII LF according to some (C) standard which isn't true for all C compilers. Not sure if its worth mentioning though...

Ervee 10:29, 25 January 2007 (UTC)

Well I guess that would depend... I've never actually used a C compiler with abnormal \n and \r, but if you can find one, that is somewhat notable and designed for ASCII based platforms, I would be interested. Shinobu (talk) 14:07, 19 October 2008 (UTC)

The Table with notes

This article has a table with notes inside the table. The notes are currently implemented with {{ref}}/{{note}}, but they has been converted to cite.php a few times with an automatic tool. This conversion moves the notes away from the table. I think these notes should stay with the table. Gimmetrow 12:45, 9 February 2007 (UTC)

Thanks for your excellent work on this article, Gimmetrow.
I agree that the notes should stay with the table.
Perhaps we should (shudder) "subst:" the {{ref}} and {{note}} tags? That would stop the well-intentioned automated conversion, but it would make subsequent editing much harder. Just a thought.
Cheers, CWC(talk) 14:01, 9 February 2007 (UTC)
No substing please. Instead tell people not to change the notes to refs, because they in fact arent refs. Shinobu (talk) 14:08, 19 October 2008 (UTC)
Did you notice you're replying to an obsolete discussion? The table in question was converted to use Cite.php's new group feature on 2008-08-03, as that made it possible to keep the notes with the table. Anomie 16:07, 19 October 2008 (UTC)

rest of values

Why does the page not show the values up to 255? What do those mean? Nate | Talk Esperanza! 19:56, 13 March 2007 (UTC)

Whatever they might mean, they are not ASCII character codes. ASCII is a 7-bit character code standard. There are no ASCII characters for codes other than 00 to 7F. — Loadmaster 20:44, 13 March 2007 (UTC)

Clearer identification of overstrike characters

I have three coments or suggestions, all related to this paragraph from the section on "ASCII printable characters":

"Seven-bit ASCII provided seven "national" characters and, if the combined hardware and software permit, can use overstrikes to simulate some additional international characters: in such a scenario a backspace can precede a grave accent (which the American and British standards, but only those standards, also call "opening single quotation mark"), a backtick, or a breath mark (inverted vel)."

1. I had no idea what an "inverted vel" was, so I followed the link (http://en.wikipedia.org/wiki/Vel) - but it was not at all relevant. I think the link should be removed -- or better, replaced with a link that does explain what a "vel" or "inverted vel" is. I did some search engine queries but found no explanation.

2. In this same sentence, I also wish that the references to "grave accent", "backtick", and "inverted vel" clearly specified the ASCII character/symbol being referred to. The reader should not have to figure this out (and this particular reader, in fact, cannot figure it out, as I explain below).

For example, "grave accent" clearly refers to character 96, and the link to the Wikipedia article explicitly states this: "In the ASCII character set the grave accent is encoded as character 96, hex 60." So in this article, I would suggest something like "grave accent (character 96)" to clearly specify the ASCII character being referred to.

Next, what character does "backtick" refer to? If it is character 96, as I believe, then perhaps the sentence should be modified to show the equivalence, as in "a backspace can precede a backtick or grave accent (character 96)". There is no Wikipedia article on the backtick, but "backtick" redirects to the Grave accent article. That article uses the term "backtick" but never explicitly states that "backtick" is synonymous with (or a homoglyph for) "grave accent".

Finally, "inverted vel" ... to which ASCII character does this refer? You tell me! Could someone in possession of this knowledge add it to the article?

3. Finally, in the same sentence, the existing text says "a backspace can precede ...", followed by a list of exactly three characters. One way to read this sentence is that the list is exhaustive - only those three characters - which I hope was not intended. It would be better to say something like "a backspace can, for example, precede ...". I note that the text has omitted other characters commonly used in the same way (as overstrikes), such as the comma (character 44), used to simulate a c with cedilla (Ç or ç), and the forward slash / (character 47), used to simulate a Scandinavian slashed "o" character or a greek phi (ø), or the hyphen/minus symbol (character 45), used with zero (character 48) or with uppercase O (character 79) to simulate the greek letter theta (θ)...

I hope these comments meet with favorable consideration, and that someone will rewrite this paragraph. Or I'd be glad to do so, if someone can point me to definitive information on the "inverted vel".

Aeolopile 06:16, 20 April 2007 (UTC)

I suspect that the "inverted vel" refers to character 94 (^) KerryVeenstra 06:54, 20 April 2007 (UTC)
The original author's sentence (from 11:42, 24 June 2003) considers an inverted vel to be a breath mark: "ASCII provides some internationalization for French and Spanish (both spoken in the U.S.) by providing a backspace with the grave, accent (miscalled a "single quote"), tilde, and breath mark (inverted vel)." I seem to remember some early glass terminals that provided this backspace capability. From elsewhere, a vel is a spear (Murugan). KerryVeenstra 22:43, 20 April 2007 (UTC)
IIRC, an early version of ASCII had "↑" (up-arrow) instead of "^" as character 94. CWC 03:02, 21 April 2007 (UTC)

Pound symbol

In England the Currency symbol mapped to character 35, so that telex transmissions which included the currency symbol did not automatically covert to a confusing value when transmitted between the two countries. That is, the $ symbol mapped to # when transmitted from US to GB, the LSD symbol mapped to # when transmitted GB to US, $ never mapped to LSD.—Preceding unsigned comment added by 150.101.166.15 (talk) 01:25, 2 July 2007 (UTC)

That makes sense. Do you happen to have a Reliable Source we could cite for that? CWC 07:38, 2 July 2007 (UTC)
ftp://ftp.isi.edu/in-notes/rfc20.txt
3 These characters should not be used in international interchange without determining that there is agreement between sender and recipient. (See Appendix B4.)
4 In applications where there is no requirement for the symbol #, the symbol (Pounds Sterling) may be used in position 2/3.
and
http://wps.com/projects/codes/Revised-ASCII/page4.JPG
http://wps.com/projects/codes/ECMA-6.pdf
also, in more detail:
http://www.transbay.net/~enf/ascii/ascii.pdf —Preceding unsigned comment added by 150.101.166.15 (talk) 05:03, 21 November 2007 (UTC)

Character Names

Would be nice if the printable char table included the english names for each character. —Preceding unsigned comment added by 74.93.101.81 (talk) 21:13, 23 September 2007 (UTC)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Yes check.svg Done - WP:COMMONNAME seems to apply here. Neıl 11:03, 4 January 2008 (UTC)

Requested move

American Standard Code for Information InterchangeASCII — Much more popular as an acronym than spelled out. (60,000,000 vs 182,000 Googlehits.) Was at ASCII for years, including getting to be a featured article. —Callmederek (talk) 21:15, 29 December 2007 (UTC)

Survey

Feel free to state your position on the renaming proposal by beginning a new line in this section with *'''Support''' or *'''Oppose''', then sign your comment with ~~~~. Since polling is not a substitute for discussion, please explain your reasons, taking into account Wikipedia's naming conventions.

Discussion

Any additional comments:

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


My ASCII

My ascii produces all characters as printable except 000. Is the article wrong or is it just my computer? Thanks, George D. Watson (Dendodge).TalkHelp and assistance 15:25, 20 February 2008 (UTC)

ASCII acronym

The term ASCII is an acronym which appears to have two potential expansions, one is the American Standard Code for Information Interchange, the other is the American Standards Committee for Information Interchange. The available online literature is unable to produce an absolute and unambiguous corroboration that it is one or the other.

The former, whilst appearing to make sense, makes less sense when one considers that ASCII was developed by ASC: the American Standards Committee.

Is there a citation available to the original ASCII standard? If not, then it should be acknowledged that two alternatives are in common use.

Gregmal (talk) 00:00, 11 March 2008 (UTC)

I think the relative google hits should be indicative. Standard Code (-wiki) = 225,000 hits, Standards Committee (-wiki) = 726 hits. But if that isn't enough, see scans of ASA standard X3.4-1963 and 1964 article on X3.4-1963. Don't trust O'Reilly. Gimmetrow 03:55, 11 March 2008 (UTC)
I scanned an ASCII article from the July 1964 issue of Electronics World and put it on my web site. This was written when the first version of the American Standard Code for Information Interchange was news. Notice it is Upper Case only and has early control codes. Electronics World, July 1964 -- SWTPC6800 (talk) 04:32, 1 April 2008 (UTC)
Yes, ASA X3.4-1963 did not specify codes for lower case characters, though it had a lot of undefined codes. Gimmetrow 04:40, 1 April 2008 (UTC)

Article freshness

In my opinion the article has grown in size I believe it is its size that impedes its appeal. Sasepeev (talk) —Preceding comment was added at 21:35, 31 March 2008 (UTC)

The pound sign in ASCII

On the BBC computer, the pound sign was in the lower ASCII table, as well as the hash and the dollar. However, on today's PCs, the pound sign is no longer in the first 7 bits - it's now 8-bit ASCII. The symbol that replaced it is the backwards apostrophe (`). Can anyone tell me why the pound sign (£) was taken out of 7-bit ASCII on today's PCs, when it was there on the BBC computer from the 80s? Thanks!

I've just checked, and the pound sign was CHR$96 or CHR$&60 on the BBC (96 decimal, 60 hex). The hash was in its usual place of 35 (decimal).

90.205.80.229 (talk) 21:33, 7 April 2008 (UTC)

The sign ‘£’ was actually never in true ASCII, merely in encoding based upon ASCII but adapted to other use. As more powerful systems have been produced, they have supported more sophisticated encodings, supporting larger character sets. In order to facilitate inter-system compatibility, efforts are made to converge on an encoding. It was naturally that true ASCII should form the core of most extensions. —SlamDiego←T 21:58, 7 April 2008 (UTC)
The term ASCII has been misused. It means American Standard Code etc, and has characters needed in the USA, no other. ASCII has been used as name for a lot of 7-bit and 8-bit encodings. --BIL (talk) 07:44, 8 April 2008 (UTC)
  1. I don't need to be told this.
  2. The article begins by expanding the abbreviation to “American Standard Code for Information Interchange”.
  3. Many characters used in America are not found in ASCII, and the ‘£’ is used in America, albeït not as often as were the characters in the ASCII encoding.
SlamDiego←T 23:43, 8 April 2008 (UTC)

Removed text

IMO this edit removed some interesting material. If the article is too long, let's split it, not throw away good content. Andrewa (talk) 06:41, 3 June 2008 (UTC)

The "structural features" are still there in history, which describes how it came to be. Gimmetrow 07:23, 3 June 2008 (UTC)
Well, yes, of course they are, I linked to the history above, didn't I? I don't see any justification for the removal in the summaries. Would you like to give one? Andrewa (talk) 01:39, 4 June 2008 (UTC)
The text which was moved (not removed) had no link to the history section. Gimmetrow 01:44, 4 June 2008 (UTC)
Oops, so it was. Missed that somehow. Thanks! Andrewa (talk) 10:40, 4 June 2008 (UTC)

More removed text

Here's another edit removing what seems to be useful references. I think these are worth having in a separate subsection. The neologism "asciify" is not mentioned elsewhere in the article and I'm hesitant to put up a new article just for that. --A12n (talk) 15:09, 11 June 2008 (UTC)

It doesn't need an article, it's at best a dictionary definition, and since one of the "refs" is a wiki and the other is a blog about software called "asciify", is it even notable? Gimmetrow 20:47, 11 June 2008 (UTC)
The principle I think is that it is worthwhile to show how the key term of the article is used. So for instance the neologism "ASCIIbetical" is indicated elsewhere in the article. I don't think it's inappropriate at all to mention such usage (which has a place also in dictionaries we'd agree) in an encyclopedic article. Is this knowledge really so irrelevant to the subject that it is excluded from the article? And since we agree a separate article is not currently merited, effectively excluded from being explained anywhere in Wikipedia as a whole? I think not. "Asciify" is obviously a coinage, but is established enough to get 58,000 hits on Google (I cherry-picked a couple of refs; perhaps better ones can be found); "asciified" 1420 hits; and "asciification" 463. That ain't huge I admit, but it is significant, representing at least an emerging usage related to use of ASCII. In fact maybe the core issue here should be that conversion into ASCII is something people do for various purposes (namely certain text needs or preferences and ASCII art), and there's a word for that.--A12n (talk) 14:05, 12 June 2008 (UTC)

Space?

Space is 000 0000? or just the 10101's and then a space? Androo123 (talk) 05:28, 2 July 2008 (UTC)

Space is 010 0000, which is 32 decimal and 20 hex. Anomie 22:09, 3 July 2008 (UTC)

Extended ASCII Code

There is nothing here about the Extended ASCII code. I think we should write something about it. --Mustafaahmedhussien (talk) 19:59, 17 October 2008 (UTC)

The Variants section covers various extensions to ascii in detail as does the Extended ASCII article. --Salix (talk): 20:25, 17 October 2008 (UTC)
I meant the representation of the some other control characters like the arrows & backspace. They are represented in two bytes one of them is NULL.--Mustafaahmedhussien (talk) 03:35, 21 October 2008 (UTC)
Nothing in ASCII is represented in two bytes, and ASCII does not contain arrows. Are you confusing ASCII with keyboard scan codes? Anomie 11:29, 21 October 2008 (UTC)

Removed from lead

The standard character set on the World Wide Web was originally ISO Latin-1 ( also called ISO-8859-1)[1] (an ASCII extension), but became the ISO10646 Universal Character Set in 1997.[2]

This doesn't seem directly relevant to this article. Gimmetrow 02:06, 29 March 2009 (UTC)

printable version of non-printable characters

copy-pasted from QBasic's help file page containing a table of ASCII characters:

000 (nul)

001 ☺ (soh)

002 ☻ (stx)

003 ♥ (etx)

004 ♦ (eot)

005 ♣ (enq)

006 ♠ (ack)

007 • (bel)

008 ◘ (bs)

009 (tab)

010 (lf)

011 ♂ (vt)

012 ♀ (np)

013 (cr)

014 ♫ (so)

015 ☼ (si)

016 ► (dle)

017 ◄ (dc1)

018 ↕ (dc2)

019 ‼ (dc3)

020 ¶ (dc4)

021 § (nak)

022 ▬ (syn)

023 ↨ (etb)

024 ↑ (can)

025 ↓ (em)

026 (eof)

027 ← (esc)

028 ∟ (fs)

029 ↔ (gs)

030 ▲ (rs)

031 ▼ (us)


(I hope Wikipedia can handle control characters being posted like this)

Not all of them actually got a printable glyph, at least on the dos screen and on the font they are being displayed for me in Firefox. --TiagoTiago (talk) 01:34, 1 August 2009 (UTC)


some of them i can get with alt-xx , but I gave up trying to get a complete list for the moment since some resulted in some functionality being triggered on the browser as if I had actually pressed the corresponding keys...--TiagoTiago (talk) 01:40, 1 August 2009 (UTC)

Misuse of the term "ASCII"

The term "ASCII" is sometimes misused to refer to a superset, or almost superset, of ASCII such as ISO-8859-1. I even once talked to someone who apparently thought "ASCII" was a general term for the whole concept of organising character glyphs in an ordered set. JIP | Talk 6 July 2005 07:45 (UTC)

The image on the article depicting the character set

Not to be an anti-imagist, but I'm questioning the usefulness of including the image depicting the character set. I'm interested in removing it. Would there be support (or opposition) among you for doing so? Thanks. Courtland 01:08, July 27, 2005 (UTC)

FIFA WorldCup 06 live in ASCII

for those interested: [2] (^_^)

EOF character in MSDOS.

MSDOS included two methods of file handling, 'text' and 'binary'. 'Text' method used the EOF character. When reading a file, file position and size of file was ignored, characters streamed to the EOF character. This method was faster. The MSDOS internal Copy command used 'binary' method for copying files with .com and .exe extension, used 'text' method for other files unless the /b option was specified. This meant that other files (for example .zip) files could be accidently truncated if used with the internal Copy command. The MSDOS external command xcopy did not have this behaviour. This behaviour was standard for all versions of MSDOS including Windows 98 SE, and probably for the copy of command.com included with Windows 2000. Windows XP does not include a copy of command.com (the command shell that included the copy command), and the copy command provided by the cmd shell probably does not have this behaviour.—Preceding unsigned comment added by 150.101.166.15 (talk) 01:48, 2 July 2007 (UTC)

Are you sure that this is correct? I would find it weird to make copy-as-text the default but maybe that was really what it did - I've seen weirder things. However, it is the speed differency which I find hard to believe. When copying files, the speed of the disk is bottleneck and any software processing shouldn't noticably affect the transfer speed since it's waiting for the hardware most of the time anyway. Shinobu (talk) 14:15, 19 October 2008 (UTC)
This was not faster, this was due to CP/M compatibility. CP/M could only make files a multiple of 128 bytes long, so the ^Z was used to mark the end of a text file. MSDOS2 added actual file lengths to the file system so the files had exact lengths and this was no longer necessary. However old files would still be padded up to a multiple of 128 so there was an option to find the ^Z and truncate it when copying. I think this was always an option, the default copy command always copied the entire file. In MSDOS1 there could be no difference as there was no way to shorten the copied file to a non-multiple of 128.Spitzak (talk) 04:27, 29 September 2009 (UTC)
Correct. Current incarnations of the MS-DOS and Windows copy command provide a /A switch to explicitly specify an ASCII text source file, which stops copying at the first Ctrl-Z (SUB) character. It also provides a /B switch to specify a binary source file, which is the default. — Loadmaster (talk) 16:46, 29 September 2009 (UTC)
I know this is late, and not really relevant to the ASCII article, but I have some comments:
  • Binary mode was faster than ASCII mode. The reason is that ASCII mode included the additional scanning for the EOF (control-Z) character.
  • MS-DOS did not special case for .EXE/.COM. The MS-DOS COPY command defaulted to binary mode for all copies with these exceptions. 1) You were combining files using either file+file or file* into a single file, or 2) you were copying to a device such as CON: or LPT:. Those two exceptions defaulted to ASCII mode and you needed to use /B to force a binary copy.
  • Windows XP includes command.com meaning if you have that then you can test some of the copy command behavior. I'll say "some" as the version of copy provided supports long file names implying its behavior may not be exactly the same as the MS-DOS's command.com. FWIW, the CD command only works with 8.3 names. 75.55.120.132 (talk) 01:45, 6 September 2010 (UTC)

Printable versus Graphic characters

I would think that every serious computer-programmer has at some point programmed in C, C++, Java, or one of the many languages based on C, and is therefore familiar with the printable versus graphic distinction made by the character-class tests isprint and isgraph.

Why are parts of this article consistent with that distinction and other parts not? Eugenwpg (talk) 21:00, 26 September 2009 (UTC)

The opposing view leads to many contradictions within this article and elsewhere within the Wikipedia. Consider the Ctype.h article: it clearly explains the isprint and isgraph tests wrt where the space-character falls, then it links to this article which, under the opposing view, would then promptly contradict it. Eugenwpg (talk) 15:27, 27 September 2009 (UTC)

The definition of "graphic" and "printable" used by the C library does not necessarily correspond to the definitions used in this article. In particular, the definition here seems to be that "printable" and "graphic" are essentially synonymous; in the C library, "isprint" seems to be used for this definition and "isgraph" seems to mean "any visible graphic". I'd certainly wait for Gimmetrow to reply before trying to force any "decision" here. Anomie 16:50, 27 September 2009 (UTC)
There may not be a contradiction here - K&R may be talking about something else. Mackenzie is used here for the history of ASCII development, where the space is considered a "graphic" (Mackenzie's term) as opposed to a control: "But was the Space character a control character or a graphic character?... It is, of course, both. However, from the point of view of a parallel printer, it is only one of those things, the invisible graphic. By this rather hair-splitting reasoning, the standards committee persuaded itself that the Space character must be regarded as a graphic character; that is, it must be positioned in a column of graphics, not in a column of controls." I have to imagine Mackenzie knew something about programming, but nevertheless called it an invisible "graphic" rather than a non-graphic or invisible "printable". I don't know why he used that terminology, but if the discrepancy is an actual substantial difference among authorities, then we really should keep both phrases and references. If we want to use the 95 count based on definitions from C, then the article could identify the basis for the count. Gimmetrow 18:26, 27 September 2009 (UTC)
Not to confuse things any further, but the Unicode article says this:
Graphic characters are characters defined by Unicode to have a particular semantic, and either have a visible glyph shape or represent a visible space.
Should we be using a more, um, modern accepted meaning of the world "graphic", as per the Unicode standard for example? (I also note that "visible space" seems to be an oxymoron, but let's not go down that alley just yet.) — Loadmaster (talk) 21:43, 28 September 2009 (UTC)
On the other hand, since this article is about ASCII it would make sense to use the ASCII definition. Anomie 23:08, 28 September 2009 (UTC)

Equivalent symbol....???

See Talk:Table of mathematical symbols#Why is the equivalent symbol not here....??? —Preceding unsigned comment added by 222.64.27.154 (talk) 02:02, 20 March 2010 (UTC)

Ascii is only the first 128 characters, which does not have such a symbol. Replied on above page for actual symbol. --Salix (talk): 09:16, 20 March 2010 (UTC)

Ambiguity

(in the article, not in ASCII!) Speaking of CR and LF, the article says, "Transmission of text over the Internet, for protocols as E-mail and the World Wide Web, uses both characters." Meaning it uses the sequence CR-LF all the time, or it uses either interchangeably, or? Mcswell (talk) 02:35, 19 July 2010 (UTC)

Most standard protocols, such as FTP, HTTP, and SMTP, are supposed to use CRLF line endings. But thanks to the prevalence of Unix systems as servers with their native LF-only line endings, we have Postel's Law which in practice means implementations should be prepared to accept either CRLF or LF line endings. Would it be sufficiently unambiguous in the context of the article to say "Transmission of text over the Internet, for protocols as E-mail and the World Wide Web, also uses both characters"? Anomie 02:57, 19 July 2010 (UTC)

Incorrect history info?

In frank, I don't think ASCII is based on the ordering of English alphabet at all ........ coz it was just the definition made by human ......... how can we sure that lower case letter must be larger then a upper case letter? .... srsly those things are defined by human ...... but not by the nature of English alphabet .... —Preceding unsigned comment added by UnknownzD (talkcontribs) 6 Aug 2010

Well based does not have to mean that every detail has to come from the English alphabet, just a good proportion such as the ordering of the letters. Yes someone had to decide which to put first, but that is just one bit of information.--Salix (talk): 10:48, 6 August 2010 (UTC)
When the ASCII standard was first released in 1963, it did not define any lower case characters. In the 1960s most data processing was uppercase only. Computer terminals did not support lower case; in the 1970s lower case was an extra cost option on CRT terminals. -- SWTPC6800 (talk) 02:55, 6 September 2010 (UTC)
Computer terminals built from Selectric typewriters certainly *did* support lower case long before 1970.Spitzak (talk) 20:23, 7 September 2010 (UTC)
I used an IBM 2741 Selectric terminal in the 1970s but they were not as common as the ASR-33 Teletype which was uppercase only. -- SWTPC6800 (talk) 22:48, 7 September 2010 (UTC)
The IBM 2741 did not use ASCII, it had "correspondence coding" that was related to the tilt and rotation of the print ball. I wrote an I/O driver to connect one of these to a computer. -- SWTPC6800 (talk) 13:31, 8 September 2010 (UTC)

Old vs. new printable characters chart

I viewed the ASCII page today and noticed that the printable characters chart had changed format significantly since the last time I visited the page, several days previous. While Cybercobra made several edits over the past few days that were of benefit to the article, I contend that we should stick with the old chart simply because it was easier to use. The old chart had the binary, octal, decimal, and hexadecimal characters neatly labeled. The fact that the new chart does not have these labels certainly doesn't make it impossible to use, anyone with even a little bit of a background in computer science or math can distinguish which is which, but people with that kind of background knowledge aren't the only users of Wikipedia. For the layperson who views the ASCII article, the new chart is likely very confusing and would force them to spend some unnecessary time deciphering it. I also preferred the vertical columns style and the font of the previous chart, however those are just matters of personal preference. I personally would advocate reverting to the previous chart. Neil Clancy 22:30, 15 August 2010 (UTC)

It was done to comply with the {{cleanup-chartable}} tag added by User:Spitzak --Cybercobra (talk) 22:42, 15 August 2010 (UTC)
So you're saying I should be asking him for his reasoning? Neil Clancy 23:13, 15 August 2010 (UTC)
No, just that (apparently) {{Chset-tableformat}} is the way to do character charts, as reinforced by the existence of {{cleanup-chartable}}. Sounds like your issue is with {{Chset-tableformat}}'s presentation generally; it's used on several other articles. Perhaps you should propose a change to said template? Or that {{cleanup-chartable}} should be modified/deleted. --Cybercobra (talk) 00:22, 16 August 2010 (UTC)
I stuck that tag in there because the table did not match the others. I greatly prefer a much smaller table, but I don't think the standard one really does the job. For an encyclopaedia I would like to see a single cell with the glyph (or an abbreviation for non-printing glyphs) in it and nothing else. There certainly is no reason to show over and over again the translation of the table location to decimal and certainly not to octal (!). Also the 4 digits of Unicode are probably redundant and confusing (witness the erroneous edit just now by somebody who thought it was the hex version of the table entry number), though I think it is clever that the macro uses it to produce the correct character. Would like to see all these templates scrapped and the tables reduced to a single glyph in a cell, perhaps with row+column headers designed to help you convert to decimal and hex if people really think that is important.Spitzak (talk) 19:17, 16 August 2010 (UTC)
Providing decimal and octal is quite useful and fairly conventional for reference works. --Cybercobra (talk) 22:56, 16 August 2010 (UTC)
Well this really sucks. They broke the chset macro to not use the unicode value, which will break any other tables that update to this. It is supposed to put &#nnnn; in, not the text. The chset-char3 macro works correctly and would reveal that putting 0xNN in is wrong.
I also like to think the readers of Wikipedia are able to add 2 numbers together. Just put the decimal equivalent in the left column and top row and let them add them together. And I just do not see the reason for Octal.Spitzak (talk) 09:11, 17 August 2010 (UTC)
I don't think its to important to get hung up on the {{chset-cell4}} template. The template provides a means of rendering the cell correctly, the use here may not match the documentation of the template, but i don't see that as any great problem. To be really precise we should create another cell template designed for one byte character sets, but that would have no visible effect on the page. What is important is that we only show a 1 byte hexadecimal representation, so 0x41 for "A", a two byte representation 0x0041 would be wrong as in ascii thats nul "A". I agree with Cybercobra that showing hex and decimal representations are useful and octal is probably also useful as ascii does date back to the times when octal was more prevalent and i can imagine a scenario when someone has some old code in octal they want to decode. I'm not particually worried about wether its 0x41 or just 41, the 0x does help distinguish the hex from decimal reps.
I'm agnostic about the two versions of the chart, but I do find the old version somewhat clearer than the new version.--Salix (talk): 09:53, 17 August 2010 (UTC)
All other tables using chset stuff place the Unicode equivalent at that location, not the hex version of the location (which is trivial to figure out by putting the letters in the row & column headers next to each other). This can easily be seen in any example that is not ASCII or ISO-8859-1. See ISO-8859-5 for an example. Spitzak (talk) 10:09, 17 August 2010 (UTC)
Perhaps it would be more clear to put (or have {{chset-cell4}} output) "U+0041" instead of just "0041". <joke>BTW, isn't "0x41" also misleading as it implies the code is the 8 bits 01000001 when it is really just the 7 bits 1000001?</joke> Anomie 15:41, 17 August 2010 (UTC)

I, too, prefer the old version of the chart. It's far more readable. I think the layout of the new table is more confusing. The last username left was taken (talk) 01:33, 4 September 2010 (UTC)

The former version matched the other table in the text. I do prefer the former table, but in any event I think the two tables should look similar. Gimmetoo (talk) 00:57, 8 September 2010 (UTC)

Font size for Unicode control character glyphs

We seem to have a potential edit war starting over whether the Unicode glyphs in the U+2400 block should be displayed at the normal font size or at an arbitrarily-increased size. Let's discuss it. For reference, these characters are:

␀␁␂␃␄␅␆␇␈␉␊␋␌␍␎␏␐␑␒␓␔␕␖␗␘␙␚␛␜␝␞␟␠␡␢␣

On my Firefox on Linux, they look something like this:

Unicode 2400 block (Firefox on Linux).png

On Safari on OS X, they look something like this:

Unicode 2400 block (Safari on OS X).png

In both cases the characters are rather hard to read, but this is presumably intentional on the part of the font design and IMO there is no point in arbitrarily increasing the font size. Anomie 15:02, 25 October 2010 (UTC)

My Safari OS/X looks similar except "esc" is drawn like the other letters and ␢␣ is narrower.Spitzak (talk) 05:24, 26 October 2010 (UTC)

On Ubuntu 10.10, both Firefox and Chrome look like this:

Unicode 2400 Chrome Ubuntu.pngSpitzak (talk) 05:24, 26 October 2010 (UTC)
Interesting, it must be using a different font. Chromium for me uses the same font as Firefox. Anomie 11:15, 26 October 2010 (UTC)
Better readability seems a good enough reason to me. Screw the font designers' vision. Sometimes workarounds are necessary. None of the renderings presented are easily legible, although Linux seems to do better at least. --Cybercobra (talk) 06:28, 26 October 2010 (UTC)
The thing is, these characters really are that small. While a case might be made that all characters should be blown up to show detail, that would require it be done for ASCII#ASCII printable characters too and it would require they all be the same size. Expanding them arbitrarily so they look "right" to one random person's sensibilities with their particular browser and font is certainly not the thing to do. Anomie 11:15, 26 October 2010 (UTC)
I would certainly not put a *different* scale on some of the characters. That will look obviously wrong in all the above examples. Also surely somebody still uses IE and Windows, can they post a screen shot?Spitzak (talk) 19:34, 26 October 2010 (UTC)
I would favour readability over strict adherence to font sizes. It is an article about ASCII not a specific font used to represent unicode. I'm guessing the fonts are designed the way they are so that all characters can fit into a standard size for use in character table. This is not an issue for us as we have two separate tables. I'm happy to keep with font-size:large which is just about readable, just so we don't end up on WP:LAME.--Salix (talk): 20:57, 26 October 2010 (UTC)

Vertical bar in second picture

There is a slight problem with the vertical bar character (hex 0x7C, column 7 row 12) in the second picture. It looks identical to the slash (0x2F), but it should be completely vertical. In the original scan (jpg, see picture source) the bar appears slightly slanted, probably caused by inaccuracies in the printing and/or scanning process, but there is a clear distinction between the slash and the vertical bar. Lemming (talk) 03:07, 6 November 2010 (UTC)

I guess the "inaccuracies in the printing and/or scanning process" made me type a slash. Fixed now. Thanks for the notice on my talk page. - LWChris (talk) 12:09, 6 November 2010 (UTC)

"most recent update during 1986"

Why was the standard updated in 1986? What could have needed changing? I'd imagined it was pretty stable since the 1960s and there wouldn't have been any cause to mess with it. The control codes are obsolete but there's no need to remove them when they can be ignored (and indeed they haven't been removed), and for jobs that go beyond the capabilities of ASCII there are all kinds of more recent schemes that have superseded it, so I'm struggling to imagine what they changed. The source given is "American National Standard for Information Systems — Coded Character Sets — 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII), ANSI X3.4-1986, American National Standards Institute, Inc., March 26, 1986". It would be good if someone could have a look at this, if they have access. Beorhtwulf (talk) 22:40, 28 February 2011 (UTC)

Invisible?

Ifyoutakeallthespacesoutthetextisveryhardtoread.Clearlywecanseethespacessotheyarenotinvisible. --Wtshymanski (talk) 03:36, 3 March 2011 (UTC)

Just because something is not visible doesn't mean it doesn't take up space. Anomie 03:58, 3 March 2011 (UTC)
Too subtle for me. If I can see it, I call it "visible". "Is that window closed?" someone will ask me, and "No", I'll say "I can see a space between the window and the sill". Or am I like Alice, who could see Nobody on the road a great way off? --Wtshymanski (talk) 14:42, 3 March 2011 (UTC)
Let me put it this way. In the following box, there is one (non-breaking) space character. Tell me where it is, without highlighting the text to make it a different color or looking at the source or cheating in some other manner.
 
Is it at the left? The right? In the middle? It's not visible, but it still takes up space. You can only "see" a space character normally because of the gap it leaves between the characters that are visible. Anomie 19:34, 3 March 2011 (UTC)
But if you take the spaces out, the document looks different. True, I wouldn't be able to tell by looking at the paper lying in the printer if the printer had just received a "form feed" character (apparently obsolete), or 66 lines of 72 CHR$(32) followed by a CR LF sequence (surely not obsolete control characters), or if someone had just left a blank sheet of paper in the output tray. The intent of sending 0X20 to the printer is to make a *visible* space in the line, though. If we didn't want to see the space, we'd send 0X00 (apparently another obsolete control character). Or possibly even 0X1A, but probably not 0X1B (definitely not obsolete). --Wtshymanski (talk) 20:22, 3 March 2011 (UTC)
You both have good points here. Like most things on wikipedia the resolution comes from the sources. Do the reliable sources class a space character as a visible character or not? --Salix (talk): 00:21, 4 March 2011 (UTC)
Sources? Sure:
  • "But was the Space character a control character or a graphic character?... It is, of course, both. However, from the point of view of a parallel printer, it is only one of those things, the invisible graphic. By this rather hair-splitting reasoning, the standards committee persuaded itself that the Space character must be regarded as a graphic character; that is, it must be positioned in a column of graphics, not in a column of controls." — Mackenzie, Charles E. (1980). Coded Character Sets, History and Development. Addison-Wesley. ISBN 0-201-14460-3. 
  • "SP (Space): A normally non-printing graphic character used to separate words." — RFC 20
HTH. Anomie 04:00, 4 March 2011 (UTC)
CT-1024 Terminal with monitor [1]

The ASCII space is most certainly a printing character. Please put up with my history lesson of video terminals in the early 1970s and I will explain.

When ASCII was developed, video terminals had character only displays. This was a single font, often only upper case. The terminal had a read/write memory to hold every character on every line of the display (including the space character.) The IC that converted the ASCII code to a bit pattern for display was known as a "character generator" The most popular one was the Signetics 2513 MOS ROM. This would produce characters 5 dots wide and 7dots high for raster scan CRTs. It just handled the 64 upper case ASCII characters. Deluxe terminals would offer a lower case option that increased the read/write screen memory and a second character generator for the lower case characters. (The early terminals used shift registers, not RAM for screen memory.)

The control characters in the first two columns for the ASCII chart were not stored in memory. The next four columns were stored in screen memory and displayed on the CRT. (The last two columns were only used on terminals that supported lower case.) The 2513 character generator converted the ASCII code into a 35 dot display pattern for each letter. (There were additional blank dots between each character and blank lines between the rows.) The ASCII space just produced a pattern were all 35 dots were off.

The computer did not have direct access to the screen memory, but video terminals would allow cursor control to any location on the screen. The next character would print at the cursor location. To erase a six letter word on the screen, you would set the cursor at the beginning of the word and send six space characters. The spaces over printed the characters in memory and the new 35 dot pattern was displayed. The ASCII space printed all dots off unless the terminal did inverted video, then it was all dots on.

Here is a Signetics data book that has the 2513 Character Generator. This online copy is the 1972 edition; my personal copy is the 1971 edition.

Here is a web site that explains how a character generator works.[3]

-- SWTPC6800 (talk) 03:00, 4 March 2011 (UTC)

All of which means little. DEL was used to "print" over sections of paper tape to erase them, but it's not considered printable. Anomie 04:00, 4 March 2011 (UTC)
A DEL turned a printable character into a control character on the paper tape. In a character generator ROM a space is was one of the 64 character dot patterns. When the video terminal came to a screen memory location with a hex 20; it displayed a 5 by 7 pattern of 35 dots off. (Or 35 dots on in reverse video.) When RFC 20 was written in 1969, printing terminals such as a Teletype were way more common than video terminals. On a Teletype, a space doesn't put ink on the paper (not printing).-- SWTPC6800 (talk) 04:49, 4 March 2011 (UTC)
A DEL is a "punchable" character, however. It causes the punch mechanism to punch a pattern of holes and advance the tape. For the tape a DEL is the same class as any other character, unless there is one that prevents punching.
I agree with the original poster, SPACE is a character. The hardware argument is pretty persuasive: it is enormously easier to treat SPACE as a character with no bits turned on (thus reusing all the existing hardware that puts any other character on the screen and moves the cursor right) than to treat it as a control character with special handling. In addition ASCII chose bit patterns so it was easy to group SPACE with the "printable" characters, since the hardware manufacturers demanded it, so they could treat it as a printing character.Spitzak (talk) 17:04, 4 March 2011 (UTC)
Soemthing that has always amused me in the computer racket is the keen appreciation you develop for the different kinds of nothingness; space, blank, null, NUL, 0...--Wtshymanski (talk) 18:46, 4 March 2011 (UTC)
When ASCII was being developed, video terminals, by and large, didn't exist. An ASCII computer terminal would be something such as a Teletype Model 33. Guy Harris (talk) 19:30, 24 August 2012 (UTC)

You might be interested in the work of Roy Sorensen. See for example "Nothingness" at the Stanford Encyclopedia of Philosophy. —Ruud 15:32, 19 March 2011 (UTC)

ASCII85 printable characters

I added a mention of ASCII85 in the section about the printable characters. Previously there was no indication of why the printable characters should be set apart from the rest of the ASCII characters. ASCII85 is an old encoding format, but the article about it says that it is still in use in modern times in PostScript and PDF. Working that info, and any other similar info, into a small paragraph will illustrate why the printable characters are distinctly important.

Off the top of my head, I can't think of anything that uses only the the 95 printable characters. ASCII85 was the best that I could come up with to show the notability of the printable characters, but it does not include the space. I know this seems obvious, but there really ought to be another example that uses all 95 printable characters, and I can't think of one.

Badon (talk) 08:55, 28 August 2011 (UTC)

Do we really need to explain why it's important that you can *read* the characters? The thing that uses the 95 printable characters is *printing*. Weird encoding schemes dreamed up by grad students are a decidedly secondary application and somewhat beside the point of an alphabet. --Wtshymanski (talk) 14:31, 28 August 2011 (UTC)

The name US-ASCII

The article currently says:

 The IANA prefers the name US-ASCII to avoid ambiguity.

But in the cited reference, there is no mention made of why US-ASCII is preferred.

Axnicho (talk) 14:38, 24 August 2012 (UTC)

  1. ^ "Hypertext Markup Language (HTML)" version 1.2 by Tim Berners-Lee and Daniel Connolly, 1993
  2. ^ HTML 4.0: "Document Representation", 1997