HomePage |
RecentChanges |
EditorIndex |
TextEditorFamilies |
Preferences
Ability to edit files up to 2 Gigabytes in size. On some platforms, it could even be bigger.
Some TextEditors are limited by size of RAM. Others implement a form of VirtualMemory? or Paging that allows them to work with very large files without limit. In recent years, a large number of text editors boast that they are limited only by available memory or disk space, and are capable of editing "any size file." But in practice, editing large files involves at least five considerations:
- How many lines does the file have? There are large data files that can have 2 or 3 million lines. Several large-file editors are capable of handling several million lines.
- How long can the lines be? This is hidden factor that impairs some editors that have no problem editing a file with 3 million lines, but which cannot reliably handle lines over 5,000 or 10,000 characters.
- Does the editor have to distinguish or detect Unix, DOS, or Macintosh line endings? Unix files end each line with a single linefeed (0A hex); Macintosh files end each line with a carriage return (0D hex); and DOS files end each line with 2 bytes: a carriage return and linefeed together (0D 0A hex). Can a large file editor detect which is being used?
- What do you want to do about wordwrap? Do you expect the editor to support "virtual wordwrap" on screen, and then to toggle back to "unwrapped" or normal long lines instantly? Is is supposed to handle line numbers in the margin also?
- As files get spectacularly large, they are more often likely to be files from a database, perhaps fixed-length fields or variable-length fields delimited with a comma, tab, or some other symbol. Can the editor handle the full-range of binary code, including NUL (0x00)? VEDIT gets top marks for handling binary code well.
War stories
A good choice for handling extremely large datafiles is [PDT-Windows], which boasts that it can handle files up to 18 quintillion bytes in size (that's 18 * 10^18 bytes). This editor majors on database handling, including interpreting 30 forms of numeric encoding (integer, BCD, double-byte, signed/unsigned, Comp-3, floating point, etc.). I tried it out on one particularly hairy Informix data file. Technically, PDT-Windows is not a text editor; it's a data editor which focuses on databases. It's not a hex editor or sector editor, however. It's entirely unique in its niche.
I once tried to find an editor to read and edit a 32 meg XML file with lines up to 25,000 characters wide. I tried several well-known editors, including VEDIT, GnuEmacs, vim, MultiEdit, UltraEdit, PFE, and others, but most of them gave up at line-ending detection after 4k or even 8k. Some of them could edit the files but the line-counter would be corrupted, or would break lines where they shouldn't. Only NoteTab Pro immediately recognized it as a Mac file, and flawlessly handled the line counter, including wrapping and unwrapping these extremely long lines. It also moved extremely fast through the file, where other editors were sluggish. NoteTab's free version, NoteTab Lite, failed, but the "Pro" version was definitely worth the price ($19.95), even against competitors which charged 10 times more.
For twice the price ($39.95), EmEditor is a Unicode editor for Windows that will handle files up to 2 billion lines in length, and several million characters in length. See my report below of having it properly wrap and unwrap lines of nearly 6 million characters, plus keep track of the current line and character number. I found it to be extremely responsive. (EricPement)
Another War Story
I was a contractor at a large telco and worked on large billing output files. These were essentially large ASCII files, often over 2-4GB in size. (In fact just writing a file that big on 32-bit Unix was a challenge at the time because the APIs were limited to 32 file length, but I digress) I was able to edit these files using VIM back in 1998. I tried GNU Emacs and it failed miserably. As I showed our billing analysts how to use VIM, it became widely adopted -- nothing (free) else would do.
Tips for large XML
I don't have it now but there's a XSLT script that's about 4 lines long that will reformat nasty
XML into nicely indented XML that fits on "shorter" lines. Combined with Apache XSLT processor (C version)
and you can reformat a 35MB XML file in about 15 seconds.
Yes, I tried using GnuEmacs and VIM and both had a horrible time with a single line with 35MBytes in it.
Recommendations
Some popular large file editors include:
- EmEditor - Professional (version 7.0) will edit files up to 248 GB in size, over 2 billion lines (and yes, it also will keep track of the current line number, even for such large files). Web page asserts EmEditor will load an 809 MB file in 13 seconds. I loaded a file with lines 5.7MB in length, and EmEditor had no problem toggling word-wrap on/off, keeping track of its location, copying the lines, or informing me of the current column number (line 5, column 5,704,464). This is a winner.
- NotepadPlusPlus - Open source (free) software, was also able to read a 28 meg file (6 lines with 5,700,000 chars per line), and accurately keep track of its position, line number, and column number. I could also toggle word-wrap on/off. However, it is many times slower than EmEditor and found itself overwhelmed when I tried deleting things in the middle. (Tested with NPP v4.9.1.)
- VEDIT - handles lines over 100K in length, but only autodetects line endings at 4K. Max filesize for standard VEDIT is 2GB (gigabytes); max filesize for VEDIT Pro64 is over 100GB.
- NoteTab Pro - handles lines up to 32K in length (including autodetect of line-endings), and max filesize of 2GB (gigabytes)
- GnuEmacs - v19.28 and under, max filesize was 8MB. Version 19.29 and over on a 32-bit system, max filesize is 128MB. Also limited by available disk space for the swap file. If Emacs is compiled on a 64-bit system, the max filesize is 556PB (petabytes), or 5.7 * 10^17;.
- VIM - max line length is 2GB (gigabytes); max filesize is 2GB; also limited by available disk space for the swap file. If VIM is compiled where a long integer is 64-bits, max filesize is also also 556PB (petabytes).
- JujuEdit - max filesize is 2GB (gigabytes); also does direct edit on disk (faster)
- MultiEdit - v9+ handles lines up 16K in length, filesize limited by available memory.
- UltraEdit - handles lines up to 8K chars in length; filesize up to 4GB (gigabytes); disk-based text editor.
- bvi - Binary VI; filesize limited by available virtual memory.
- PDT-Windows - database editor. Max filesize is 18EB (exabytes) or 18 * 10^18 though as far as I know, is no disk storage that large anywhere.
- TextPad - can handle file sizes up to the largest contiguous chunk of 32-bit virtual memory.
CategoryFeatures