Talk:Text file

Computing B‑class High‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
B	This article has been rated as B-class on Wikipedia's content assessment scale.
High	This article has been rated as High-importance on the project's importance scale.

This article is written in British English, which has its own spelling conventions (colour, travelled, centre, defence, artefact, analyse) and some terms that are used in it may be different or absent from other varieties of English. According to the relevant style guide, this should not be changed without broad consensus.

Merge with plain text

I agree that this should be merged with plain text. I think the result should be named "plain text", but the contents should mostly or completely come from this page text file. --NealMcB 17:30, 25 April 2006 (UTC)[reply]

I disagree. These two terms completely differ, a text file being a file comprised of text alone, and plain text being a snippet of usually unformatted text (that can come inside any type of document, e.g. a presentation or a data sheet). Obviously, their differences should be clarified by correcting their definitions in their respective pages. PaF 21:41, 13 May 2006 (UTC)[reply]

Yes 'plain text' and 'text file' are not synonymous, but that doesn't mean they shouldn't be merged. The material of this article should be fitted into plain text, and 'text file' should be a sub-section thereof which simply says something like: 'a text file is simply a file that contains text data without any binary data.' If someone searches for 'text file', they should be directed to plain text. --Apantomimehorse 01:23, 5 July 2006 (UTC)[reply]

I also disagree. 70.58.81.57 22:47, 4 July 2006 (UTC)[reply]

I support the merge; the concepts are close enough that the distinction can be explained within one article. --Gerry Ashton 02:35, 1 August 2006 (UTC)[reply]

I object to the merge. While there is a degree of overlap in the terms, they are distinct enough to warrant separate articles. A text file can be plain text, RTF or HTML for example. Plain text files are text files but not all text files are plain text. Dread Lord CyberSkull ✎☠ 12:21, 10 August 2006 (UTC)[reply]

I'm going to take the merge tags off, as there seems to be more opposition to the merger than agreement and there's an awful backlog of articles to be merged. Let's get this off the list. Jaye 13:55, 24 August 2006 (UTC)[reply]

.txt article says around the same informations. 16@r 00:05, 24 October 2006 (UTC)[reply]

I also object. A text file can contain plain text, but they're two very different entities. To agree with an above poster, a text file can contain plain text or it can contain RTF or HTML. Also, from the plain text article, it is noted that plain text is only usually stored in files. It don't have to be. Alex Peppe 00:31, 2 January 2007 (UTC)[reply]

I also object. Plaintext may be transmitted through a network connection or held in memory. A text file refers to plaintext *stored* on a disk or tape. --Nil0lab 03:10, 24 March 2007 (UTC)[reply]

I object, because just now I have trouble classifying text with my program that uses file (Unix). Structured text, such as XML has a lot of non-textual tags, while plain text (per definition) hasn't. Otherwise, I agree with PaF, Apantomimehorse, 70.58.81.57, Dread Lord CyberSkull, Alex Peppe and Nil0lab. Said: Rursus ☺ ★ 17:19, 25 June 2007 (UTC)[reply]

I disagree. A protocol using plain text is different to a text file (a protocol doesn't necessarily transfer files). 83.254.215.231 (talk) 12:21, 21 January 2008 (UTC)[reply]

binary (software)

Hi, I don't think you should move binary (software) to binary (computing) because it actually discusses binaries, ie compiled applications, whereas binary (computing) sounds like it's going to discuss how computers use 1s and 0s... Evercat 00:35 2 Jun 2003 (UTC)

First my apology to have forgot discussing naming issue first at talkpage. I think binary file is more accurate term. What do you think? -- Taku 00:47 2 Jun 2003 (UTC)

If this article will predominantly be talking about binaries (as opposed to source code or any other form of text file), then I concur: binary file is a better name. -- Wapcaplet 00:50 2 Jun 2003 (UTC)

No, actually I think we need a broader article. After knowing plain text has more about the characterstic of binary file, we may want to have a combined article probably called binary and text file or something. Any thought? -- Taku 00:55 2 Jun 2003 (UTC)

We probably do need a broader article. There isn't too much to say about binary files (aside from the fact that they're not human-readable, for whatever reason). In this context, the word "binary" is more of a piece of terminology, rather than a person/place/thing that needs an encyclopedia article. If anything, it should maybe be incorporated into File format or some related article. Some file formats are considered binary, yadda yadda. -- Wapcaplet 01:02 2 Jun 2003 (UTC) (Though, "human-readable" is pretty vague. Some humans, myself included, are capable of reading binary files and occasionally understanding them :) -- Wapcaplet

I was thinking of what title might be good. As people know, I tend to merge small articles into one big article because I believe Wikipedia is not a dictionary, so we don't want an article that just defines the title of the article. I am not sure file format is a good article to take about a distinction between binary and text files. Actually I want to dicuss for example fopen function of C, which you need to specify a file is binary or text. I mean this topic distinction between binary and text can be expanded a lot more. So after all, to avoid making one big article, an independent article called binary and text files seems fine. Any other idea? -- Taku 01:20 2 Jun 2003 (UTC)

You're probably right. Evercat 01:22 2 Jun 2003 (UTC)

Sounds OK. Personally I'd stick it under File format, but you make a good point that there is the need to distinguish between these two broad classifications of file types. Some ideas:

Text files almost always refers to strict ASCII, though I suppose any information which can be interpreted according to some standardized character code (unicode, UTF, or whatever) would qualify. Informally, as has already been established, usually means "human readable," though there are times when ASCII can be used for other things (ASCII art, for instance) which doesn't really fall under the 'readable' category. Another way of looking at it is that you don't need any special software to view them (though, the definition of 'special software' could be hairy. I get sick of trying to explain to people that you don't need to have Dreamweaver in order to edit an HTML file! :)
Binary files could be practically anything. As already pointed out, all files on a computer are binary in the strictest sense (text files are just special cases). Binary can be compiled executable code, object code or libraries, images, media such as audio or video, ZIP archives, or you name it.
In the context of downloading software, you often see "source code" versus "compiled binary executable", which is another apt analogy.

-- Wapcaplet 01:39 2 Jun 2003 (UTC)

Hi. Um, I have a slight problem with this article because it is rather Unix-centric. In Unix systems, (traditionally), there was a very clear distinction between text files and binary files primarily owing to the ASCII standard, ie: by (unix) definition, a file couldn't be a text file if it contained any character with a byte value over 127. Under Macintosh, (and Windows???) systems, an extended, 256 character encoding was always used. It was completely accurate on a Macintosh to refer to a file as text so long as it was human readable. Today, the point is perhaps mostly moot, as Mac OS X is now Unix-based, and Unicode has become the standard, but I think it still confuses Mac and Windows people today when a Unixer talks about text files as being different from, say, a file that makes use of curly quotes or other high-bit characters in a particular encoding. AdmN 18:23, 30 Aug 2004 (UTC)

Hmm, in my experience, Unix doesn't make this confusion at all. A file with only text data is a text file; everything else is not. Perhaps you are referring to the Unix utilities that try and infer whether a file is plain text or not, for it's true that these are historically ASCII-centric. More up-to-date utitlies don't make this mistake, nor is any text/binary recognition distinction built-in to the OS proper.

In fact, I would say that Windows makes the much worse confusion, for the binary/text option of the C file functions has no affect in Unix, but it does in Windows; all 'text mode' really does, however, is automatically convert any line feed byte into a carriage return followed by a line feed when writing (and the reverse when reading). (I don't know about Macs, but I'm guessing it converts line feed to carriage return and vice versa.) --Apantomimehorse 01:39, 5 July 2006 (UTC)[reply]

Requested move

Text files → Text file – {use singular form as per std}

Add *Support or *Oppose followed by an optional one-sentence explanation, then sign your vote with ~~~~

Support Singular is more encyclopedic, I feel. UrbaneLegend 23:01, 17 February 2006 (UTC)[reply]

Done, for standardization. Rd232 ^talk 22:17, 18 February 2006 (UTC)[reply]

Disputed

Confusing!!

The article defines text files as approximately plain text, but this is essentially wrong. A text file contains text that is intended for human information transfer as opposed to binary or data files that for most parts will remain unknown for the ordinary user. This means that

MS Word produces text files.
A yet more confusing example is HTML-files, who cannot be called plain text, but who use to be regarded as text files.

The article must be enhanced to treat structured text beside plain/flat text. Said: Rursus ☺ ★ 19:50, 25 June 2007 (UTC)[reply]

Besides, having MIME in the article contradicts the intro. MIME is a somewhat structured text. Said: Rursus ☺ ★ 20:10, 25 June 2007 (UTC)[reply]

MS Word doesn't produce text files, since doesn't produce files that are human-readable without special software (unless you want to look through all the binary blobs for the occassional bit of your text.) Happysmileman (talk) 19:32, 6 December 2007 (UTC)[reply]

Analysis of intro

A text file (or plain text file) is a computer file which contains only ordinary textual characters with essentially no formatting. [a text file is a file intended to be human readable, not computer readable (binary)] The term 'text file' is typically used in contrast with the term 'binary file', even though any file is fundamentally a sequence of arbitrary bits, and many computer components (for example, all hard disk circuitry and most system software) make no distinction between file types. that's confusing! However, a large percentage of application programs can understand and use text files in some way, but few programs can typically understand and use the contents of any particular binary file. Hence the distinction can be useful to computer users. this misses the point – the point is that since text files are intended for humans to read, not computers, data loss and data confusion hurt much less. Plain text is just the readability extreme, structured text can either be processed to look nice, or the structurals can be removed, and the purpose is still not lost. Said: Rursus ☺ ★ 20:05, 25 June 2007 (UTC)[reply]

Evil rewrite made

Yihiheee!! (Giggering evilly, twirling the moustaches)! I simply rewrote the intro to refer to human text information files. I know there's a conflict between three different interpretations of text files:

text file = plain text file - doesn't contain control characters, but may contain newline characters,
text file - is intended for humans to read,
text file - is readable by any unspecialized text editor, and may be compiled or interpreted by a programming language.

So in essence my change was too drastic, and if also the original meaning is reinserted beside mine, I will be happy too. Said: Rursus ☺ ★ 20:59, 25 June 2007 (UTC)[reply]

tag for rewrite ;; what is going on with this article?

There is some seriously dubious content going into this article, and it is consequently tagged for re-write. It may be suitable to revert this to a previous version, but something needs to be done.

For example:

   A text file is a file intended for humans to read, so it mainly 
   contains character data that can be processed to display a readable 
   text in any natural language.

Where does this definition come from? This whole "intended for humans" definition sounds vague, unencyclopedic and pointless. Assembly programmers are humans also, blind people are humans also; and what with the "red on green" coloring in the article body? Can someone show where this formatting is recommended under WP style guidelines?

Please, have some citations and reliable sources nearby when making substantial modifications to this article. They are desperately needed. dr.ef.tymac 23:57, 25 June 2007 (UTC)[reply]

Basic cleanup done: Initial cleanup has been done. This is a start on cleanup, but the article still needs attention. Please: do not add definitions or substantial revisions unless you can back it up with citations to reliable sources. Thanks. dr.ef.tymac 00:54, 26 June 2007 (UTC)[reply]

It's me! Your objections are highly relevant, since I felt that redefining in the direction I proposed was taken too far. However: the original text seemed quite doubtful to me, because it was in disaccord with the merging debates between plain text and text file, where many opinions objected the merger on the basis of structured text (f.ex. XML). The problem is that there is an ambiguity in the term text file. I think there is no official definition on what is a text file. One definition regards usage of control codes 0x00..0x1F and 0x80..0x9F within the file, the other regards the usage of the file. Both definitions have limitations and cases when they are absurd, such as for a highly control code tagged human text (f.ex. MS-Word DOC) (which is then binary and textual), the usage definition becomes absurd, and for a plain text XML code, which don't use control codes but yet is pretty unreadable and heavily tagged. The trouble is that "text file" is ambiguous and that the article must reflect this ambiguity.

I adher to your stand point that the text shouldn't be touched without adding citations – with one exception: if anyone objects to my changes and wish to restore some of the former text beside the current (referring to the text before, say, 24 June 2007) – it's better than OK by me! Now, I'm going to improve by finding the citations that we wish to add. Said: Rursus ☺ ★ 11:26, 26 June 2007 (UTC)[reply]

BTW: wiktionary treats "text file" as "human readable relatively unformated text" and later on "not being binary", and "being distinguished from word processing files". The trouble is structured texts, which are really regarded as text files. Wiktionary is a tertiary source and cannot be used for citations, but it gives a preliminary hint on where to go from here. Said: Rursus ☺ ★ 11:42, 26 June 2007 (UTC)[reply]

I agree with your basic point that "text file" has ambiguity. To be blunt, I don't personally like the term very much, but it is sufficiently widespread and common to be notable and citable, and hence this article gets to stay instead of being summarily deleted.

Nevertheless, because the term is so ambiguous, reliable cites (to sources that fully acknowledge and understand the inherent ambiguity) are the only authoritative solution here. This is why the "human-readable" definition lacks merit. With all due respect to the contributors to Wiktionary, the term "human-readable" is worse than meaningless. Unless you come from a planet where computers program themselves, and there are no humans who do that work, then even many "binary" files are intended to be "human-readable" at least at some point or another.

Bottom line: thanks for clarifying, and for helping to clear up some of the problems with the ambiguity. I think the best way to proceed is for us to resolve the ambiguity (and debates) by requiring contributors to start adding more cites. Good job on the work you've done so far to bring these issues to light. dr.ef.tymac 15:00, 26 June 2007 (UTC)[reply]

You don't like the term text file, and I may in a sense agree, because the computer science is so full of confusions for perspectives from various actors: the customers, the programmers boss, the programmer and the end user. "Text file" as an official term might not exist, or it does differently according to various technical committees in the Anglosaxon world. Then as a linguistic compound, it certainly exist, but then it means what "text" and "file" infers to us, at the same time that it has a de facto usage (probably technical) that is as valid as the linguistic compound. This is the eternal struggle for being understood we compsciers all the time must fight, till we've invented our own separate language. Said: Rursus ☺ ★ 08:22, 27 June 2007 (UTC)[reply]

Sources of variable quality

Please add everything you find here (!):

"wiktionary.org on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: tertiary (3), confusion on def'on'usage and on def'on'tech-criteria;
"The Jargon File (version 4.4.7) on "text"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: primary? (1?), only uses def'on'tech-criteria, and another, for me unknown, meaning;
"Webopedia on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: tertiary (3), very vaguely distinguishes between files containing text, and files only using ASCII (obsolete, but generalize to any character encoding);
"MSN encarta on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: secondary (2), only uses the "contains only alphanumeric characters" definition (which benevolently must also be interpreted to include space, parentheses and interpunction);
"lookwayup.com on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: tertiary (3), uses a defn declaring only ASCII to be used, including "formatting instructions";
"Foldoc.org on "text file"". {{cite web}}: Cite has empty unknown parameter: |1= (help) - source: primary (1), says that text files don't contain "invisible" control characters; contrasts with rich text, binary file, flat file.

Sample usages: "MAVID file input description". {{cite web}}: Cite has empty unknown parameter: |1= (help) Kidisk //musikk —Preceding unsigned comment added by 88.91.88.113 (talk) 21:05, 17 July 2010 (UTC)[reply]

the criticism is distracting

I was distracted by the criticism in the document, eg. citation needed, vague... I'm lazy right now, maybe when I get home I will fix the criticisms.

--146.145.210.126 12:55, 20 July 2007 (UTC)[reply]

Stupid image

The current image accompanying this article should be replaced with something non-stupid. —Preceding unsigned comment added by Radishes (talk • contribs) 2007-08-03 20:59:26

Cool. Fire up your favorite SVG editor and create a better one, if the image is so bad this should be a piece of cake. dr.ef.tymac 02:25, 4 August 2007 (UTC)[reply]

Highly questionable wording in intro

Sourced or not, this is nonsense:

"text files are intended to be viewed or interpreted by application software, whereas binary files are executable by the operating system."

Just to prove my point: Blender .blend files, MS Word DOC files, JPG, PNG, GIF, TIFF, OGG, MP3, WAV, AIFF, etc are all examples of "binary files" which are "intended to be viewed or interpreted by application software".

Meanwhile, MS-DOS .bat files, Unix shell scripts, and programs written in Perl, Python, BASIC, and other interpreted languages are examples of "text files" which are "executable by the operating system". The footnote about "source code" doesn't alter this fact: compilation or interpretation often happens in RAM and often no binary file is created in the process.

So, this is a totally wrong distinction to make.

Also, note [4] is NOT a source for this statement, it's an explanatory footnote.

What distinguishes "text files" from "binary files" is that the byte stream in a text file has a simple, unambiguous mapping to a sequence of characters which may be rendered as human-readable glyphs, arranged in a simple human-readable form.

We do use application software to do this rendering, but that is equally true of many binary files, so it is not a distinctive property of text files. The definition of text file is also tightly linked with the concept of a "text editor", which is an application specifically designed to manipulate text files. Indeed, a good definition of a text file is "a file which may be easily processed using a text editor".

"Plain text", though it probably has more than one distinct meaning (and probably therefore deserves a disambiguation?), in this context, means a text file which furthermore does not contain special formatting instructions (unlike XML or HTML, for example), usually called "markup". Thus, "plain text" contains little or no structure (this is fuzzy because paragraph breaks, newline characters, and setting headings off by empty lines can all be thought of as exceptions).

The term "text file" actually dates from a time when it was essentially synonymous with "ASCII encoded file", but the rise of other encodings, and particularly Unicode has stretched the meaning by making what we think of as "text files" more complex and less unambiguous. But even so, there is a clear distinction between a straightforward representation of text and a rich-text or page-description language which contains complex formatting information. Digitante (talk) 14:32, 11 February 2008 (UTC)[reply]

I agree that statement was very misleading. It couldn't be allowed to stand, so I removed it. The intro needs to be enlarged now, I'd guess, but I'm not prepared to do that; feel free. Note that there are plain text, plaintext, etc. articles existant. -R. S. Shaw (talk) 21:32, 11 February 2008 (UTC)[reply]

End-of-file marker?

The article says:

"The end of a text file is often denoted by placing one or more special characters, known as an end-of-file marker, after the last line in a text file."

This is incorrect information supported by a [weasel word]. Most file systems don't use the concept of "end-of-file" marker, and most systems definitely don't use a special marker for the last character in a text file.

It is arguable, on the other hand, wheter the last line of a text file is or isn't ended by a newline marker (whichever it is, CRLF, CR or LF). But the newline marker is definitely not a marker for the end of the file.

-- Rgiusti (talk) 15:19, 9 August 2012 (UTC)[reply]

Sentence lacking grammar

"According to Unicode Microsoft protocol for txt files use UTF-8." I cannot parse this sentence. Can someone who knows what it tries to say, make it meaningful?

Curious observation

I happened to notice today a peculiar dogma or "double-standard" that is being implicitly asserted in this article with regard to textual/string representations. On the one hand, a "modern" OS is said to no longer require end-of-file markers, seemingly equating progress to this feature. However, on the other hand, lines themselves are still typically ended with new line character(s), so it would seem that this "anachronism" survives at the line level of detail. To my knowledge, this would be due to the difference in the abstractions themselves. File lengths, being the domain of the OS, are apparently more "modern" than the file formats, which are effectively invisible to the OS.

All in all, to me, this seems to presume a bit of glibness within the article in recognizing the trifling optimization at the OS-level, while ignoring the bigger potential optimization at the line-level. I'm not sure what to make of this, except that I find this observation interesting, and would prefer that the article not be so dogmatic. 75.139.254.117 (talk) 04:22, 20 November 2016 (UTC)[reply]

Well, end-of-file markers are redundant because the information about file size is otherwise stored in file metadata, and this information is required there because of different properties of modern file systems. File metadata does not store the information about individual lines within a file, mostly because there is no good use for such information outside text file editing/displaying, in which case significant portions of the file would be read anyway. Of course such metadata could be store in some file system, but then associated data structures would end up taking more space then single LF byte, which is hardly an optimization. So basically the article is right in recognising deprecation of explicit EOF markers but not of explicit EOL markers. — Dmitrij D. Czarkoff (talk•track) 18:02, 27 June 2017 (UTC)[reply]