DOS/Linux/MAC text file problems¶
Convert DOS Text Files to UNIX Text Files¶
Text File Formats¶
Text files contain human readable information, such as script files and programming language source files. However, not all text files are created equal - there are operating system dependencies to be aware of. Linux/Unix text files end a line with an ASCII line-feed control character which has a decimal value of 10 (Ctrl-J). Microsoft Windows (or MS-DOS) text files use two control characters: an ASCII carriage return (decimal 13 or Ctrl-M), followed by a line-feed. Just to mix things up, Apple OS/X text files use just a carriage return.
Problems can, and do, arise if there is a mismatch between a file's format and what the operating system expects. Compilers may be happy with source files in mixed formats, but other apps, like PBS, throw strange errors or just plain give up. So, it is important to make sure text file formats match what the operating system expects, especially if you move files back and forth between systems.
How To Tell¶
On most systems, the file command should be sufficient:
$ file <filename>
Assuming a file named foo is a DOS file on Linux, you may see something like this:
$ file foo
foo: ASCII text, with CRLF line terminators
This indicates foo is a DOS text file, since it uses CRLF (carriage-return/line-feed). Some editors, such as vim and Emacs, will report the file type in their status lines. Other text based tools, such as od can also be used to examine the ASCII character values present, exposing the end-of-line control characters by their value.
How To Convert¶
If you're lucky, the system may have utilities installed to convert file formats. Some common names include: dos2unix, unix2dos, cvt, and flip. If they are not present, then you can draw on one of the basic system utilities to do the job for you.
The simplest involves using an editor, such as vim or Emacs. For instance, to use vim to control the format of the files it writes out, try setting the file format option using the commands :set ff=unix or :set ff=dos. Regardless of what format the file was read in as, it will take the format last set when written back out.
Another option would be to use a command line tool, such as tr (translate), awk, set, or even a Perl or Python script. Here's a tr command that will remove carriage returns, and any Ctrl-Z end-of-file markers from a DOS file (note that the character values are octal, not decimal):
$ tr -d '\15\32' < dosfile.txt > linuxfile.txt
How To Avoid the Problem¶
There aren't many ways to reliably avoid this problem if you find yourself moving back and forth between operating systems.
- Use
vimon all your systems, and modify the startup config file to always setff=unix, regardless of the OS you are on. - Use Windows tools that produce proper Linux files (vim or notepad++, for instance).
- Install a conversion tool on your system, and apply it before moving the file. Most tools are smart enough to converting a file only if required (e.g. if converting to DOS and a file is already in DOS format, leave it alone).
flipis available in source form, and works on DOS, MAC OS/X, and Linux. - Move text files in a Zip archive, using the appropriate command line option to translate end-of-line characters. However, you'll have trouble if you accidentally translate a binary file!
- Just say NO to cross-platform development!