← All Posts

Actually, Windows Encodes New Lines Correctly

Posted by

  tech web

2 min read | 532 words | 622 views | 0 comments

Ever open a text file somebody sent you in Notepad and wondered why it looked all wonky? As in, why everything is all run together on one line? The reason this happens has to do with line control characters, and dates to the days when teletypewriters still ran the world. The two control characters in question related to the carriage return (CR or \r) and the line feed (LF or \n), also known as new line. A carriage return, a manual process on a regular typewriter, returned the typewriter head to the beginning of the line. In contrast, line feed (LF) advances to the next line. Typically (but not always), these operations occur together.

When computers arrived on the scene, not all operating systems agreed on the control characters to use. MS-DOS, and subsequently Windows, adopted the traditional CR+LF. Unix, and subsequently Linux, adopted a pure LF to do both a carriage return and line feed (in other words, line feed implicitly does a carriage return). The main reason behind this was economy. Axing the CR before the LF saved one byte per line, and in those days, a byte per line was often a big deal. Mac OS opted to do something even more arcane — use a sole CR instead (modern Mac OS X now uses a sole LF).

Ever since then, this has been a recipe for major cross-platform headaches. It's why Notepad sometimes displays text from other operating systems all on the same line. Notepad expects both a CR and a LF and won't go to a new line unless it sees both — and a text file created in Linux won't encode both. UNIX guys sometimes love to gripe about the "Notepad problem", but the joke is actually on them. Windows and MS-DOS both play by the official ISO standards, which have long since dictated that both CR+LF be used and neither of them in isolation. UNIX and its derivatives have long since been violating this rule, although UNIX guys don't think it's a big deal. But, CR and LF are separate control characters for a reason. Ironically, Windows and MS-DOS remain backwards-compatible with teleprinters, while UNIX (which traditionally was often accessed from a teleprinter) is not.

While these issues occur less frequently these days thanks to programs that are more tolerant of various character control expectations, it's worth noting for the record that Windows is not actually in the wrong here. I wouldn't necessarily say that UNIX is wrong, but it doesn't follow the standards in this particular case. The CR+LF standard used by Windows is "most correct", and you should use CR+LF for this reason unless working with programs that fail to work properly with both (e.g. Linux shell scripts).

The takeaway here is that nothing is wrong Windows Notepad. It "fails" to recognize UNIX-style newlines because that isn't the ISO standard. The standard is to use both CR+LF. If your text file follows the actual standards, Notepad displays it correctly. The problem isn't Notepad — it's your text file.

Read more:

← All Posts


Comments

Log in to leave a comment!