Talk:Newline

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated Start-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 

Merge[edit]

Please discuss merging this article with End-of-line in the "Merge?" section.

Examples to convert line endings[edit]

There are examples how sed works with line endings. Those are examples which work only with some variants of sed, but require a non-posix compliant sed (but like available on most linux). E.g. http://www.grymoire.com/Unix/Sed.html#uh-nl - the variant described here e.g. don't work with the versions shipped with MacOSX.

1958 version of ASCII[edit]

This requires further explanation, because of an apparent contradiction with "1960 Originated what have become the ASCII and ISO character codes." on http://www.unt.edu/isrc/Faculty/FacultyFellows/bemer.htm, and with the material on http://www.bobbemer.com. Also, http://homepages.cwi.nl/~dik/english/codes/stand.html says that the first standardised version dates from 1963.

AIX[edit]

AIX is a Unix variant after all. Does it really follow OS/360?? Yaron 21:47, Aug 2, 2004 (UTC)

Unclear text[edit]

While doing a thorough copyedit of this article, I marked a few places where the text was unclear to me with HTML comments. Please see the page source. - dcljr 08:10, 17 Sep 2004 (UTC)

You write "what is X'15'??" I believe it's simply what a C programmer would call 0x15 -- it's a convention for hexadecimal I've seen in a few places. (Given that it appears in the EBCDIC section, perhaps it's an IBM-specific notation?) JTN 20:47, 2004 Oct 4 (UTC)

Solutions[edit]

 Would be nice if this article included or linked to methods for converting the various 
 formats since this is still occasionally problematic.  I doubt my long-standing method of 
 running a sed command from vi involving CTRL-V CTRL-M is optimal...  --16:13, 17 May 
 2005 (UTC)

I just added perl program and a couple of other conversion hints including one for emacs. Hope that helps? It's the best I know. Grem 10:43, 13 September 2005 (UTC)

Merge?[edit]

This has been a problem for some time, and its only getting worse. The articles Line feed, CRLF, Carriage Return, and Newline all contain pretty much the same information, just phrased differently -- I propose we merge and redirect all other articles to this one, as it is the most complete and has a platform-neutral title. Comments? 63.188.144.35 19:16, 11 Jun 2005 (UTC)

I'm not entirely convinced that line feed and carriage return shouldn't retain their own articles, as their distinct semantics don't quite sit right under newline. I agree that all the stuff about line-ending conventions, including the whole of CRLF, should come to newline, though, with prominent pointers from the individual articles. -- JTN 22:48, 2005 Jun 11 (UTC)
I agree. It was just bad to begin with these separate articles for very closely related topics. Today, line feed and carriage return simply denotes a newline, and historical backgrounds can be easily put in each corresponding section. -- Taku 22:53, Jun 11, 2005 (UTC)

Well, it seems there's consensus for CRLF at the very least, so I'll go ahead and merge it in. As for line feed and carriage return, I realize they have historical signifigance, but its not exactly a practical usage nowadays. I think the historical implications of the terms can be best described in the Typewriter article; it already explains that the terms have been adopted amongst computer users. 63.188.168.95 17:48, 12 Jun 2005 (UTC)

This article should be merged with End-of-line, which has a {{mergeto|newline}} posted in it. --Mareklug talk 18:20, 5 September 2006 (UTC)

  • Yes, merge CWC(talk) 06:44, 6 September 2006 (UTC)
  • Yes- end-of-line could remain as a pointer to the Newline article, but i think the content fits best in newline: \js 17:04, 7 November 2006 (UTC)

"Line anchors"[edit]

I haven't found any reference to "line anchors" on the web, except in the context of regular expressions, where the term seems to be used for the '^' and maybe the '$' zero-width assertions when in multi-line mode. While these constructs obviously work with newlines, they are not newlines, so I think the statement in the intro that newlines are sometimes called line anchors is wrong. If there is some other use of the term line anchor that I'm not aware of, could somebody please cite a source? --K. Sperling 10:38, July 25, 2005 (UTC)

I thought that I heard the term "line anchor" somewhere I don't remember. I will restore the mention of this if I find some reference. -- Taku 23:18, July 25, 2005 (UTC)

Entering Special Characters / Editor Treatment and Conversion[edit]

I'm going to remove these two sections, because the content is inaccurate and partially also irrelevant. I wanted to give a more detailed explanation here first, though:

  • ctrl-j / ctrl-m: These aren't codes for LF and CR. The ctrl-? / C^? / \c? notation / keyboard translation / escape sequence is just a way of referring to a character with a certain number. J is the 10th letter in the alphabet (counting from A=0), so ctrl-J is the simply the character with number 10, i.e. 0x0A. Similarly, Ctrl-M is 0x0D. Saying that these are "usually" LF and CR is wrong, unless you assume that computers "usually" use ASCII.
  • Entering them in vi/emacs/etc: I don't think this article is about teaching people how to enter control characters into various shells and editors.
  • The Common Problems section already says that modern text editors generally recognize all flavours of CR / LF newlines; this obviously includes the mentioned vi, emacs and eclipse (even though some people might not consider vi modern ;-). I don't think the article needs to mention how to perform the conversion in emacs specifically, though.
  • The perl one-liner will actually work on UNIX. Whether or not it works on Cygwin depends on the Cygwin configuration: If Cygwin is configured to use DOS/Windows newlines, it won't work, because the script won't see any CRs on input and they will be re-added on output (Perl uses the same text/binary IO modes as C does, and files are in text mode by default).
  • Neither GNU make nor bash ignore a final unterminated line in the versions I have tested (GNU Make 3.80, GNU bash 2.05b.0(1)-release). The only program I can think of off-hand that still has this bug is cron.
  • And last but not least, questions should go on the talk page, never in the article!

--K. Sperling (talk) 23:53, 13 September 2005 (UTC)



  • ctrl-j / ctrl-m: These aren't codes for LF and CR. The ctrl-? / C^? / \c? notation / keyboard translation / escape sequence is just a way of referring to a character with a certain number. J is the 10th letter in the alphabet (counting from A=0), so ctrl-J is the simply the character with number 10, i.e. 0x0A. Similarly, Ctrl-M is 0x0D. Saying that these are "usually" LF and CR is wrong, unless you assume that computers "usually" use ASCII.

Most computers DO usually use ASCII.

  • Entering them in vi/emacs/etc: I don't think this article is about teaching people how to enter control characters into various shells and editors.

This is one of the most common questions I get. I would like to be able to point users at Wikipedia to help solve it. Should I start a new article? Can you suggest a title?

  • The Common Problems section already says that modern text editors generally recognize all flavours of CR / LF newlines; this obviously includes the mentioned vi, emacs and eclipse (even though some people might not consider vi modern ;-). I don't think the article needs to mention how to perform the conversion in emacs specifically, though.

Again, conversion is a very common question, and it not addressed elsewhere. Instead of just removing useful information, you might instead move it to a better place? Why is it wrong for the conversion section?

  • The perl one-liner will actually work on UNIX. Whether or not it works on Cygwin depends on the Cygwin configuration: If Cygwin is configured to use DOS/Windows newlines, it won't work, because the script won't see any CRs on input and they will be re-added on output (Perl uses the same text/binary IO modes as C does, and files are in text mode by default).

Good point. I would be happy to see the one liner in Conversion as mentioned that it will work under UNIX. Any reason why not?

  • Neither GNU make nor bash ignore a final unterminated line in the versions I have tested (GNU Make 3.80, GNU bash 2.05b.0(1)-release). The only program I can think of off-hand that still has this bug is cron.

Who said GNU? The fact is that not everyone is using the latest version of all-GNU products. In any case, the caveat is useful even without the particular examples.

  • And last but not least, questions should go on the talk page, never in the article!

I don't really understand this. As a reader, seeing a question in the article points me to relevant information where I am encouraged to edit and contribute. Isn't that the point of Wikipedia? There was some discussion of "publishable" versions, and I understand that questions look unprofessional in such a text. Which is more important?

Grem 11:35, 15 September 2005 (UTC)


I'm well aware that many computers use ASCII, but it's also a fact that many don't. Especially seeing that there is a fair bit of confusion even among programmers (e.g. many people don't realize that CR and LF exist in other codes besides ASCII and have different numerical representations there), it's important not to gloss over these details. A statement like "A line feed is usually typed ctrl-j" is simply too imprecise in this context -- not only because it ignores the issue of character sets other than ASCII; in a GUI-based application (e.g. on Windows) pressing crtl-j will often not produce any character at all.
Bash is GNU Bash, and 2.05 is not the latest version, they're up to 3.something. GNU make is probably one of the more widely used make implementations, too. You can't just go claiming that "Some unix programs (like make and bash) will silently ignore the last line if there is no newline at its end.", listing bash and make (without naming any versions or specific make implemenations) as examples when the problem doesn't exist in very widely used versions. It really wouldn't have hurt if you'd tried to verify these claims before adding them to the article. (Incidentally, the introduction already says that some programs have problems if the last line isn't NL terminated, without restricting it to Unix.)
About editors (and conversion utilities), if you include Emacs and vi, why not also include pico, nano, Scite, KWrite, UltraEdit, ...? This article is about newlines, not about how to use one editor or another. Also see Wikipedia:What_Wikipedia_is_not, particularly "wikipedia is not instructive". I realize that viewing text files from other platforms is a somewhat common problem for end-users, so I think it's OK to have a few hints for the most commonly used platforms, and I've added one way to do it for Windows and listed two methods for Unix, but generally this is not what this article (or probably any article of an encyclopedia) is about. I don't see much reason for including a Perl version; there's already the comfortable dos2unix one mentioned, and tr (which is part of the POSIX standard, and available on partically every Unix platform) for where dos2unix isn't available. It could also be done with sed, awk, or even in plain bash; I don't see what merits inclusion of the Perl version. If you get asked about this often, and want to point people somewhere, why not point them to the manual of whatever editor they're using.
As for the questions to prompt contributions, I don't have a link handy, but I'm fairly certain it's mentioned on some policy or guide. It's just not done, and the fact that you edited without being prompted by a question also proves that it's not necessary ;-) --K. Sperling (talk) 13:38, 15 September 2005 (UTC)


Using the diff Program with Different Line Endings[edit]

When you use programs like diff to compare the text in two file which uses different line endings there are some ambiguites. Unlike most modern text editors the original unix diff program and GNU diff seems to think that the files differ even though the content except for the line endings are the same. This makes porting between for example GNU/Linux and MS Windows more difficult. The http://www.GnuWin32.org/ port of GNU utilities have changed this behaviour so that diff does not care whether CR/LF or just LF is used as the line ending. This seems useful to me and I think that just as the text editors the diff program ought to think of line endings as just line endings and not care about its actual format when comparing two files.


Conversion using Windows Notepad/Edit?[edit]

It's true Notepad doesn't understand LF as "new line". But instead of using edit (and advising Windows users to use old text DOS program) I'd recommend Wordpad. It opens files greater than 64 KiB (Notepad can't do this) and easily converts LF to CR LF with just open/save. Wordpad is part of standard Windows distribution and is for sure more Notepad-user-friendly than any DOS tool. I'd change the page myself, but, as can be seen in this text, my English skills won't suffice :)

Hm, I don't have wordpad installed on my windows xp... maybe I manually de-selected it during the installation, I don't really remember. Maybe we should just mention both Wordpad and EDIT then. --K. Sperling (talk) 13:20, 1 October 2005 (UTC)
From the command line (Windows Vista may also apply to other versions such as XP) WordPad is executed by the command "write" and there is no "wordpad" command. The advice above about using WordPad in Windows to convert line endings is valid and useful but should include mention that to execute WordPad from the command line it the user needs to enter "write" as in C:>write sample.txt otherwise it may appear that WordPad is not installed. --Pdegregorio (talk) 12:40, 7 May 2009 (UTC)

Is the DOA mnemonic Original Research?[edit]

I'm troubled by one of my own inclusions on this page. I have used the DOA mnemonic, which I added to the Newline in programming languages section, to help myself remember the hexadecimal equivalent of CR-LF in assembly language using the debug program on Windows. It is short, simple, and (I think) useful to programmers; but in the interest of fairness, my conscience requires I state:

  • I have not seen this mnemonic mentioned in any book or website, and so it could be argued that it is unverifiable/OR (not suitable for Wikipedia); this was particularly a problem with my original phrasing, which seemed to indicate that it was widely-used (I've rephrased it).
  • However, it is so simple that it could be thought of by anyone familiar with the hex code for CR-LF. In other words, it might have been thought of before and I am simply unaware of it.
  • Short pieces of code and such are regularly contributed to technical articles (including this one) without any source, and some are obviously at least semi-original. This a form of "mental code", if you will, to aid memory. We do not hunt down every minutely original thing, do we? (If we did, how could we do anything but copy exact phrasings of others... and wouldn't that violate copyright?)

I'm really up in the air about this one. Any thoughts of anyone else would be appreciated. BlueGuy213 04:52, 30 January 2007 (UTC)

Any information which is not mentioned in any primary or secondary source (such as a textbook or research paper) is not appropriate for Wikipedia. That's covered by WP:OR. Also relevant: Wikipedia is not for things made up in school one day — which in the intro paragraph actually mentions "original mnemonics" as something that Wikipedia Is Not for. (I didn't know that that was there until just now, either.) WP:NOT also mentions that Wikipedia is not a "how-to" guide, so if an article contains a mnemonic, it should be because the mnemonic is encyclopedic (e.g., popular), not just "here's how to remember XYZ..."
That particular mnemonic is a tad morbid, don't you think? I'm not surprised it's not used in any textbooks!
I'm going to remove the "dead on arrival" mnemonic from the article, now that we've established that it's original research, and thus no citation can be provided. (Console yourself with the thought that since it's so simple, anyone who needs it will immediately think of it on their own, without needing help from Wikipedia. :) --Quuxplusone 08:17, 30 January 2007 (UTC)
I agree that it is somewhat morbid, but it has been useful to me also, so I was hoping it could be kept. I guess if policy specifically says no original mnemonics, then I've got no legs to stand on. Oh well, I'll move on to other things... but maybe someday I'll write a computer book using it (and then maybe somebody else will add it back)! 75.5.199.76 09:59, 30 January 2007 (UTC)
The previous post was mine (forgot to sign in). BlueGuy213 10:01, 30 January 2007 (UTC)

Conversion utilities[edit]

The overwriting example given like this:

 cat filename | tr -d '\r' > filename

only worked because both cat's own buffering and the in-kernel pipe buffer meant that the beginning of the file could be read and eventually given to tr before the shell would truncate it and setup it as tr's STDOUT. You would lose the rest of your file if it was larger than those buffers (usually no more than a few tens of KB). It doesn't work because the file is truncated before you begin to overwrite it, but it might have worked somewhat if it were just overwritten since the new chunks would only be 1 char larger or smaller per-line and the buffering would have allowed enough slack to prevent unread parts from being overwritten. But that would have been risky still. —Preceding unsigned comment added by 24.200.77.59 (talkcontribs)

True enough (depending on the OS and a bunch of other stuff, obviously). I've removed the useless use of cat. (Gee, that's a silly redirect, but it does seem to get used occasionally...) If some reader doesn't realize that infile and outfile must be different files on some operating systems, then that reader probably isn't going to be using tr to port text files between different OSes in the first place. There's no need to clutter the article itself with irrelevant technical minutiae. --Quuxplusone 06:06, 25 April 2007 (UTC)

Terminal conventions.[edit]

Does anyone know what conventions were used by Terminals like the VT100? I'm aware the IBM PC compatibles had ROM BIOS that converted the key code from what is now called the "Enter" key in to CR, Ox0D,CNTRL-M, but what did standard serial terminals send?

Usually CR, but you could tell the terminal to send LF instead using an escape sequence. CWC 08:04, 5 July 2007 (UTC)
They sent ^M, and required both ^M and ^J to go to the start of the next line. VT220's at least had a configuration option to make ^J act like both ^M and ^J, not sure about VT100. —Preceding unsigned comment added by Spitzak (talkcontribs) 16:51, 13 October 2009 (UTC)

Disambiguation[edit]

I have created a disambiguation page for line break and changed the link that redirects to newline from line break to line break (computing). I also have a copy of this page saved in case anyone is looking for it, but I'm pretty sure it's just as easy to find now. —Preceding unsigned comment added by Ark2120 (talkcontribs) 16:55, 18 October 2007 (UTC)

some modern Adobe products still exhibit the obsolete Mac OS 9 linefeed behavior[edit]

We encountered this issue while trying to svn checkin a Flash actionscript edited alternately on Mac OS X 10.4.10 with "Adobe Flash CS3" and "Adobe Flex beta 3", built on Eclipse. Adobe Flash CS3 saves with CR, while Flex save with LF only (the right thing for a psuedo-UNIX like Mac OS X).

Anyhow, just thought Adobe should be publicly flogged for persisting Mac OS 9 behavior well past its obsolence, and praised for moving in the right direction finally. —Preceding unsigned comment added by 208.72.192.23 (talk) 01:08, 22 November 2007 (UTC)

Office X does the "CR" thing too, if you export an Excel document as text it'll use CR line delimiters. I don't think CR is "obsolete Mac OS 9" behavior; Mac OS X is a platform consisting of multiple glued together systems, and as such it has more of these kinds of problems than most other platforms. Adobe shouldn't be flogged so much as Apple for not laying down a standard and ensuring the Carbon and Cocoa APIs encourage that standard's use. --Squiggleslash (talk) 15:27, 22 November 2007 (UTC)

Programing Helpers[edit]

I added a bit about special handling of newlines during program execution to the "Newline in programming languages" section. It feels a bit wordy to me, but is the best I could explain it. Someone more gifted with English may want to clean it up. badmonkey (talk) 03:58, 18 December 2007 (UTC)

I am deleting the part about C++ std::endl. Using it "because '\n' isn't portable" is a very frequent bug in C++ programs, hurting performance by forcing a stream flush for every line written. I see the section correctly mention what std::endl does near the top, but further down it lists it among things "[faciliting] newlines during program execution", which is plainly wrong. JöG (talk) 10:49, 30 March 2008 (UTC)

AT command[edit]

Accessing modems using the AT command set instructions are terminated with a carrige return symbol. You can read this in Command and Data modes (modem) and Hayes command set. I have tested this also with minicom and an old POTS modem. Thence this should be mention but I'm not that good in english to edit an article in such a major topic. --84.156.100.251 (talk) 17:50, 20 March 2008 (UTC)

End of Line detection[edit]

In the bash and ksh93 shells the following will not work as '\r\n' will be seen as the string 'rn'.

egrep -L '\r\n' myfile.txt # show UNIX style file (LF terminated)
egrep -l '\r\n' myfile.txt # show DOS style file (CRLF terminated)

For these shells, one need to use builtin shell expansion $'word'

egrep -L $'\r\n' myfile.txt # show UNIX style file (LF terminated)
egrep -l $'\r\n' myfile.txt # show DOS style file (CRLF terminated)

The file command should also detect the type of EOL used:

file myfile.txt
> myfile.txt: ASCII text, with CRLF line terminators

Other tools permit to visualise the EOL characters like the following commands:

od -a myfile.txt
cat -e myfile.txt
hexdump -c myfile.txt

--Ripat (talk) 13:51, 24 June 2008 (UTC)

  • The grep commands in the article (the same as these above) do not work with bash 4.1.5 and grep 2.6.3, which is the second-latest stable version. -Pgan002 (talk) 18:42, 14 April 2011 (UTC)

Inherently smarter[edit]

Dear Sirs, I notice there are different ways to write a paragraph:

  1. No newline (\n) until the very end, maybe hundreds of characters along.
  2. Newlines every 80 characters or so. Then a pair of them at the end.

Please mention which format is inherently smarter. Jidanni (talk) 01:50, 2 February 2009 (UTC)

Microsoft Excel for Mac txt export =[edit]

I recently noticed that the export plugin of Excel for Mac exports txt files with CR only. Mayb this should be mentioned since it's a bad error source for programmers expecting LF and CR LF is all they need to take care of. CatzHoek (talk) 20:51, 4 April 2010 (UTC)

Please inform the Excel version number.Hyungjin Ahn (talk) 05:22, 1 January 2011 (UTC)

Delete section 5.1 (Microsoft product)[edit]

Hi, I don't think that the section on one particular Microsoft product is relevant for this article. I am sure there are many more programs that have bugs / incorrect handling when it comes to newlines and this is not the place to document them. I suggest to remove section 5.1 and 5.1.1 Drdee (talk) 18:51, 3 June 2011 (UTC)

Clarification needed for operating system dependence[edit]

When one makes a statement such as "Windows uses CR+LF" or "UNIX uses LF", what precisely makes these operating systems dependent on the respective convention? Namely, are we referring to command line shells, native GUI text widgets, keyboard driver, etc? There are several examples of software that can interpret these characters however they wish (or give the option to the user). As such, it is misleading to say that "On this OS, xx is the newline sequence" when the more accurate statement would be "This software in the OS treats xx as the newline sequence". Otherwise please explain why it is that (without a hack of some sort), LF cannot be used as a newline on Windows. Ham Pastrami (talk) 01:42, 26 September 2011 (UTC)

Yes, but the text says "...usually represent a newline..." which seems simple and accurate. Of course LF can be used as a newline on Windows, and I can't think of anything in the kernel which "uses CR+LF", but is there any text in the article that is wrong or misleading? Johnuniq (talk) 03:35, 26 September 2011 (UTC)

Well, the operating system dependence exists in libc, one of the most fundamental library of both OS. For example, printf("\n") has different runtime behavior (Windows outputs CR+LF while Unix outputs LF) by default. An application can bypass libc convention on its OS but it may be considered as non-native. — Preceding unsigned comment added by 130.126.60.252 (talk) 06:12, 27 October 2012 (UTC)

libc is one of the most fundamental libraries of *nix, because by design and specification the C library is the OS library (see the POSIX specification). On Windows, MSVCRT.dll is the compatibility layer that makes it possible to program against the standard *nix library. When people first started porting C (a 'small language with only 32 keywords', they used the OS library instead of the *nix library, but that made the code totally non portable, which lead to the definition of the 'standard C library'. The 'standard C library' (which includes the definition of 'text' and 'binary' modes) is not a definition of MS Windows.
The behaviour of the file operations in the standard C library is important in the standard C libary, because the string operations assume terminated strings. Other languages which do not use the standard C library for string handling do not have file operations designed for interoperability with the standard C string handling libary.
In DOS, there were two fundamental file API's: 'text' a fast, forward only, character API which determined EOF by the presence of an EOF character. Commonly used for line-oriented read and append. And 'binary', a slow, block/record API with forward/reverse and edit to record position. which determined EOF by the file size recorded in the directory. This distinction does not exist in Windows. The Windows API (CreateFile,ReadFile,WriteFile etc) does not includes a 'text' mode. — Preceding unsigned comment added by 203.206.162.148 (talk) 07:34, 20 November 2012 (UTC)