Talk:End-of-file

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 

November 2006[edit]

"ASCII control characters are out-of-band non-printing characters". Actually, ASCII control characters are in-band. -- 18:47, 13 November 2006 84.165.168.107


This article requires cleanup. Like so many people care how to get this symbol by typing into a DOS console or what C thinks EOF is... the bottom line that EOF is ASCII symbol #26 isn't even mentioned.--89.212.75.6 18:28, 24 November 2006 (UTC)

People who care about the subject matter of the article would often be interested in the details. And EOF is not ASCII character 26 (Ctrl-Z) -- rather, one (unofficial) interpretation of ASCII character 26 (Ctrl-Z) is as having the EOF function. However, other entities can also symbolize "EOF" in other contexts, including 255 (i.e. -1), 4 (i.e. control-D), etc. AnonMoos 19:12, 24 November 2006 (UTC)
I do not understand this comment. Ctrl-Z is actually a key combination rather than a character. And as far as I know end-of-file is character 26 (0x1a) in ASCII. So I added it to the article. Rbakels (talk) 04:33, 23 May 2012 (UTC)
I think the terms you're looking for are signal and software interrupt. Using CTRL-D will invoke an interrupt to send SIGQUIT in a *nix system. The ASCII character is obtained by using a CTRL-V CTRL-D sequence. I'm not terribly sure that the character code is all that relevant. Xe7al (talk) 07:51, 26 November 2013 (UTC)

end-of-stream[edit]

I came across the phrase end-of-stream (EOS) a couple of times while reading about streams, for example here. Is it correct that end-of-stream is synonymous to end-of-file and if yes, would mentioning EOS in this article be good? --Abdull 19:03, 8 May 2007 (UTC)

Tape mark implementation[edit]

I'd suggest that Tape Marks as EOF indicators and file separators belongs in this article, but the details of the implementation belong in the 7 track & 9 track articles (and QIC, DDS, Exabyte, ....). RDBrown (talk) 22:42, 31 March 2008 (UTC)

Perhaps. The beginning of the article discusses what is EOF in Unix etc. which are details of a particular implemetantion, in fact. There would not be much left. Do those articles have the info what the tape marks are? Maybe a separate article on tape mark? Jmath666 (talk) 22:52, 31 March 2008 (UTC)

word processors[edit]

Question: Somewhere in the dark cobwebs of my memory I seem to remember some word processing stuff that if not required, then strongly recommended putting a eof at the "end of the file". Maybe a Unix app, a Wordstar early incantation, or maybe even old edlin. It's not important, just wondered. Ched (talk) 01:51, 4 January 2009 (UTC)

Under CP/M, all files whose exact effective length was not indicated by the binary structuring of the data itself had to use some marker (generally ASCII 26, Ctrl-Z) to indicate the end of data within the file, since file lengths were only recorded by the CP/M filesystem in terms of 128-byte blocks. This persisted in various contexts in MS-DOS for reasons of backward compatibility (but not very often in Unix, as far as I've ever heard). This is explained in the article itself. AnonMoos (talk) 09:32, 4 January 2009 (UTC)

MSDOS stuff[edit]

I keep trying to fix this and being reverted.

My main problem with AnonMoos description is that "COMMAND.COM" produced the ^Z. This is false, it was produced and handled by numerous programs that read/wrote files and has nothing to do with COMMAND.COM or the operating system. COMMAND.COM's built-in functions such as copy certainly do not treat ^Z specially!

The claim that reading was covered in previous paragraph is false. On Unix the translation from ^D to an actual EOF is done by the TTY driver, the program cannot see what character was typed. I am not sure if this is how MSDOS acts, but from your claim that it was COMMAND.COM it sounds like it might be the same way. So at least on Unix this is irrelevant to the embedded ^Z discussion.

I am also still unable to come up with proof that MSDOS 1.0 could store file lengths other than in multiples of 128. The CPM compatability file I/O was incapable of reading or writing blocks other than 128. My own experience is that we certainly did append ^Z to files (we filled the remainder of the last block with ^Z) and we tried to "improve" the file I/O behavior by only deleting trailing ^Z characters and preserving embedded ones, I cannot recall any attempt to do anything better for file sizes. All of this is pretty vague, we switched to the POSIX api immediatly when MSDOS 2.0 came out. Spitzak (talk) 19:24, 3 December 2009 (UTC)

1) If you have two files, say "A.TXT" and "B.TXT", and you issue the following command in DOS:
COPY A.TXT+B.TXT
then the file "A.TXT" will have the contents of "B.TXT" appended to it, followed by a CTRL-Z character (ASCII 26). I really don't know what's inserting the CTRL-Z character other than COMMAND.COM. Certainly there's nothing in the underlying MSDOS.SYS file IO calls that would account for it... AnonMoos (talk) 14:52, 4 December 2009 (UTC)
If that is still true then yes COMMAND.COM must be doing it. All I can guess is that this "append" operation (which I did not even know about) triggers this. The command "copy a b" certainly does not add a ^Z to b however, and does not stop at a ^Z in a, that is what I was basing it on. -- 21:20, 4 December 2009 User:Spitzak
Tried it on a Windows ME system just before I made the post above. Actually, the COMMAND.COM copy-with-append command (signalled by the use of "+" on the command line) truncates each of the input files at the first Ctrl-Z character, appends the remaining data from the files together, and then adds a Ctrl-Z at the end of the result. To turn this behavior off (i.e. to do a pure binary append), you have to use the "/b" option switch... AnonMoos (talk) 22:46, 4 December 2009 (UTC)
Is there a way to append and put the result in another file? I'm guessing not, because the degenerate version of "appending" a single file is a copy and that certainly does not treat ^Z specially. -- 23:01, 4 December 2009 User:Spitzak
I don't think so, in one single operation on the standard MS-DOS command-line. Anyway, we actually have an article copy (command)... AnonMoos (talk) 23:06, 4 December 2009 (UTC)
2) In MS-DOS version 1, much of the read file / write file API was strongly CP/M compatible, but the filesystem on disk was FAT. If you used only strictly CP/M compatible file IO calls in a rigidly CP/M-like way, then file lengths would be multiples of 128, yet the on-disk filesystem could still record the exact length of a file in bytes. Consult The New Peter Norton Programmer's Guide to the IBM PC & PS/2 by Peter Norton and Richard Wilton, Microsoft Press, 1987 ISBN 1-55615-131-4. etc. etc. AnonMoos (talk) 14:52, 4 December 2009 (UTC)
Interesting and would explain why they did not have to change the disk format to add the POSIX calls. I'm unsure though if not providing a system api to set the field counts as "supporting" variable-length filenames however. -- 21:20, 4 December 2009 User:Spitzak
You really should not talk about the DOS 2.0 changes to the MS-DOS API as being "Posix", since they were done in 1982-1983, while the Posix standard didn't even start to be worked on until about five years later. The DOS 2.0 changes were somewhat loosely influenced by Unix concepts, but they were not fully compatible with Unix or compliant with any Unix standard, and were not claimed to be. AnonMoos (talk) 22:40, 4 December 2009 (UTC)
Ok using "Unix" or "Xenix" would be more accurate. They certainly were VERY closely matched to the Unix api design, and trying to claim they were not influenced by it is silly. -- 23:01, 4 December 2009 User:Spitzak
As far as individual directory entries (i.e. the 32 byte area which records a file's metadata under the FAT filesystem), the only changes from MS-DOS 1 to MS-DOS 2 was that the attribute byte was given the two additional bit-flags 0x010 to signify a subdirectory and 0x20 to signal whether a file needed backing up (since hard drives were also introduced in DOS 2). AnonMoos (talk) 22:40, 4 December 2009 (UTC)
Question still remains: was there a way with MSDOS 1.0 to influence the file length value? -- 23:01, 4 December 2009 User:Spitzak
File Control Block allowed for setting the record size to a value other than 128, so if you specified single-byte records, then you could, as far as I can see. The File Control Block also has a 4-byte area specifying the current length of the file in bytes, but that's mainly for internal DOS record-keeping (Norton advises caution if a programmer wants to directly manipulate that value). AnonMoos (talk) 23:15, 4 December 2009 (UTC)
See http://www.ctyme.com/intr/rb-2574.htm etc. AnonMoos (talk) 23:19, 4 December 2009 (UTC)
I don't believe you could alter the record size. It was information passed from the system to the program, not the other way around. I believe the problem with setting the value is that you had to write the new block yourself with direct io to the disk, as MSDOS would never write exactly the block you had to the disk. In any case I do believe the result is that file length was not supported until MSDOS 2.0, as I originally stated. —Preceding unsigned comment added by Spitzak (talkcontribs) 05:31, 5 December 2009 (UTC)
I believe that the evidence is that you could change the record size in DOS 1. The site http://www.ctyme.com/intr/rb-2574.htm carefully records all DOS variations, including DOS 1, but provides no indication that the "logical record size" and "file size" fields weren't already present in DOS 1, nor does the Norton book. Having a single-byte record size could well have led to inefficient disk I/O, but I don't see anything that would forbid it. If you want to research the matter further, you could look at the Interrupt List, but I don't feel like doing so at present. AnonMoos (talk) 16:47, 5 December 2009 (UTC)

Ok I searched the int21 calls. There is no call to set the record length. All calls (such as create-file http://www.ctyme.com/intr/rb-2581.htm) take an "unopened FCB" that is defined http://www.ctyme.com/intr/rb-2574.htm to be all zeros, implying that it is an output only. Oddly redundant, the FCB has a 16-bit block number (ie the offset in fixed 128-byte units) and an 8-bit "record in the current block" (ie the offset in "records"), and also a field called the "random access record number" which is 32 (or 24 for blocks larger than 64) bits and is the total offset measured in records. There are "random access" read/write that use this last number: http://www.ctyme.com/intr/rb-2598.htm or http://www.ctyme.com/intr/rb-2599.htm. There are also "sequential access" that uses the block+record in block http://www.ctyme.com/intr/rb-2579.htm and http://www.ctyme.com/intr/rb-2580.htm. Oddly enough these don't even cooperate and there is a call http://www.ctyme.com/intr/rb-2601.htm to set the random one to the sequential one (there does not appear to be the reverse call, also unknown why this is a call when it can be done by direct manipulation of the FCB).

The "file length" is set by the call http://www.ctyme.com/intr/rb-2600.htm which avoids opening the file. I would suspect it is also set by the file-open call but can't find any information. We certainly never looked at this field. Like the record length there appears to be no way to change it.

Anyway this does bring back memories of working with this. I can tell you we abandoned this crap INSTANTLY when MSDOS 2.0 added the Unix-style api. But I am pretty certain now that the programmers at MOTU knew what was going on and they were correct in saying that exact file length was in effect impossible and claiming it was there is misleading. I can assure you that if file length worked in any way we would have abandoned the ^Z handling when going from CP/M to MSDOS.

I am going to update this to restore the fact that the fix was in MSDOS 2.0 —Preceding unsigned comment added by Spitzak (talkcontribs) 18:51, 6 December 2009 (UTC)

I'm sorry, but there's no "call" to set the record length, and never was one in any DOS version; rather, you set up the FCB with the record length the way you wanted it, and then you made a call (to read, write, etc.), and when DOS did its work, it made use of the record length information (and other information) present in the FCB at the time the call occurred. The Norton book explictly mentions the possibility of setting the record length to a value of 1 with respect to Int 21h, AH=23H in order to get the exact length of a file in bytes. AnonMoos (talk) 19:20, 6 December 2009 (UTC)
Okay that was not mentioned anywhere. That is the call to *get* the file size, and it sounds like the "uninitialized FCB block" could have the record length set to a non-zero value to indicate the size that it wanted the file length in. This is for reading the length of an *unopened* file, as you pointed out the actual file length is right there in the FCB of an *opened* file. I don't remember this working, remember we were trying to get software to run as fast as possible, and I certainly don't remember filling the FCB with zeros as requested here. Then again we did not really use this call very much so may have not noticed that it failed in strange ways with uninitialized FCB argument.
There still appears to be no way to *set* the file length, other than to use direct I/O to write over the FCB on the disk, or perhaps the length can be modified in the FCB before the file is closed, but your comments on Norton seem to indicate it is not this easy. Spitzak (talk) 02:49, 7 December 2009 (UTC)
Dude, you're going further and further off -- the FCB is NOT on disk and never was on disk. It is an in-memory control structure by means of which the programmer tells DOS how the programmer wants to work with a particular file, and also contains an (undocumented) internal working area which DOS uses to record current open file status. The public (documented) areas of the FCB can be manipulated by the programer to communicate to DOS, and DOS also updates some of the fields to communicate back to the programmer certain information about what the results of an requested file operation were. I really don't know what the relevance of an uninitialized FCB would be, since an uninitialized FCB would have a name + extension field of 11 spaces or NUL bytes, and could not be used to affect any actual file, but if you're intimating that the default record length was 128 bytes (unless the programmer took action to alter the record length), then yes, that's true... AnonMoos (talk) 06:39, 7 December 2009 (UTC)
I'm beginning to feel you dislike the fact that the calls that fixed this were copied from Unix.Spitzak (talk) 02:49, 7 December 2009 (UTC)
Whatever -- they were Unix-influenced, but classic Unix snobs or purists would haughtily look down their nose and scorn the idea that they were a true implementation of a subsystem of Unix, or in full compliance with any formal or de facto Unix standard... AnonMoos (talk) 06:39, 7 December 2009 (UTC)
Also I want to fix the idea that this is COMMAND.COM's fault. I certainly personally wrote software that read & wrote ^Z so it should be explained that it is many pieces of software.Spitzak (talk) 02:49, 7 December 2009 (UTC)
A lot of people blame Ctrl-Z characters to mark the end of disk files on the underlying MS-DOS kernel, and I was trying to clarify that within MS-DOS, it is only COMMAND.COM and certain auxiliary programs (EDLIN etc.) which ever insert Ctrl-Z characters to mark the end of disk files. AnonMoos (talk) 06:39, 7 December 2009 (UTC)
It was certainly more convenient in many contexts (as well as probably often more efficient for I/O) to go with the default DOS record length of 128, but the evidence points strongly towards the option of changing the record length being available even in DOS version 1. AnonMoos (talk) 19:20, 6 December 2009 (UTC)

How to settle the matter[edit]

There is one piece of evidence which would clearly settle the matter -- that, is, if you were to find any explicit and positive indication that the record-size and file-size fields of the File Control Block were not present, or were not accessible to programmer manipulation, in DOS version 1. (I strongly suspect that that's not the case, since the whole point of DOS 2.0 was to supersede the FCB calls, so there would be no reason to bulk up the FCB calls in DOS 2.0 or later.) Until and unless you find evidence to that effect, the current situation strongly indicates that programmers could specify exact file sizes in DOS version 1 (but would have to go slightly beyond the default CP/M-compatible way of doing things to accomplish this). I have no reason to doubt that your DOS 1 programming shop (which by your admission did things in a quick-and-dirty way) went along with the CP/M-compatible defaults (and so generated files with lengths which were invariably multiples of 128) -- and it might have been very difficult to do otherwise with the particular language compiler or programming tools used. But that doesn't change the publicly-available facts about how there were other ways to do things even under DOS 1... AnonMoos (talk) 06:39, 7 December 2009 (UTC)

Sorry, it is up to you to find a way to change these values, other than a quote that says "not recommended by Norton". It was not possible. I went through all the INT 21 calls and there is none. It is your job to find it.Spitzak (talk) 18:34, 7 December 2009 (UTC)
Unfortunately for you, there is no such "call", and never has been any such "call", as I previously already explained at length in great detail above. The onus and burden is really on you to provide detailed specific valid documented evidence that that the record-size and file-size fields of the File Control Block were not present, or were not accessible to programmer manipulation, in DOS version 1. Absent such documentation, your contention that programmers in DOS 1 had no way to specify file sizes which were not multiples of 128 by9tes is simply unsupported by the availble facts, whether you like it or not. AnonMoos (talk) 22:56, 7 December 2009 (UTC)
I have no idea why the code I have seems to indicate that setting this field before Close does not work. However I now suspect the problem was not MSDOS 1., but other software that did not read the file-length field. They would instead always read 128 byte blocks and if the last block was short MSDOS put garbage (or unchanged data) in the end and the program would think it was on the end of the file. Therefore our software had to always pad out to the next 128 byte boundary with ^Z to get the files to be correctly read by other software. This is not MSDOS's 1 fault. The Unix-style calls returned the length and I think made us assume other programmers would read the files correctly.Spitzak (talk) 03:12, 23 December 2009 (UTC)

Unicode[edit]

I consulted the article in vain to find whether Unicode files may contain and end-of-file character as well. Google tells me that the answer probably is "no" - but it seems that is just as redundant as it is for plain ASCII files and they do (occasionally) have an EOF, so programs should better be prepared to handle them. Rbakels (talk) 04:37, 23 May 2012 (UTC)

Yes it is perfectly legal for a Unicode file to contain the Unicode code point 26.Spitzak (talk) 02:33, 26 May 2012 (UTC)

Ctrl-Z in CP/M[edit]

How did CP/M distinguish a valid data byte whose value was 26 from the one denoted the end-of-file mark? 108.1.140.108 (talk) 00:42, 9 July 2014 (UTC)

Most programs did not. They would stop reading the file at the first ^Z. It would be possible to detect that the ^Z is not in the last block of the file and assume it is not the eof. I think also some programs tried to make sure that all the trailing bytes in the last block were ^Z, which would allow a ^Z that was not at the end of the file to be detected. However most programs did not obey this.Spitzak (talk) 04:50, 9 July 2014 (UTC)
108.1.140.108 -- for non-operating-system files, CP/M just did whatever the currently-running program told it to do. For programs that used structured binary data files, the end of data was at the end of the last internally-defined field, and Ctrl-Z didn't matter. It was text and quasi-text (WordStar) files that needed the Ctrl-Z convention... AnonMoos (talk) 04:57, 9 July 2014 (UTC)