(redirected from LargeFileHandling)



CategoryLargeFileHandling

HomePage | RecentChanges | EditorIndex | TextEditorFamilies | Preferences

Ability to edit files up to 2 Gigabytes in size. On some platforms, it could even be bigger.

Some TextEditors are limited by size of RAM. Others implement a form of VirtualMemory? or Paging that allows them to work with very large files without limit. In recent years, a large number of text editors boast that they are limited only by available memory or disk space, and are capable of editing "any size file." But in practice, editing large files involves at least five considerations:

War stories

A good choice for handling extremely large datafiles is [PDT-Windows], which boasts that it can handle files up to 18 quintillion bytes in size (that's 18 * 10^18 bytes). This editor majors on database handling, including interpreting 30 forms of numeric encoding (integer, BCD, double-byte, signed/unsigned, Comp-3, floating point, etc.). I tried it out on one particularly hairy Informix data file. Technically, PDT-Windows is not a text editor; it's a data editor which focuses on databases. It's not a hex editor or sector editor, however. It's entirely unique in its niche.

I once tried to find an editor to read and edit a 32 meg XML file with lines up to 25,000 characters wide. I tried several well-known editors, including VEDIT, GnuEmacs, vim, MultiEdit, UltraEdit, PFE, and others, but most of them gave up at line-ending detection after 4k or even 8k. Some of them could edit the files but the line-counter would be corrupted, or would break lines where they shouldn't. Only NoteTab Pro immediately recognized it as a Mac file, and flawlessly handled the line counter, including wrapping and unwrapping these extremely long lines. It also moved extremely fast through the file, where other editors were sluggish. NoteTab's free version, NoteTab Lite, failed, but the "Pro" version was definitely worth the price ($19.95), even against competitors which charged 10 times more.

For twice the price ($39.95), EmEditor is a Unicode editor for Windows that will handle files up to 2 billion lines in length, and several million characters in length. See my report below of having it properly wrap and unwrap lines of nearly 6 million characters, plus keep track of the current line and character number. I found it to be extremely responsive. (EricPement)

Another War Story

I was a contractor at a large telco and worked on large billing output files. These were essentially large ASCII files, often over 2-4GB in size. (In fact just writing a file that big on 32-bit Unix was a challenge at the time because the APIs were limited to 32 file length, but I digress) I was able to edit these files using VIM back in 1998. I tried GNU Emacs and it failed miserably. As I showed our billing analysts how to use VIM, it became widely adopted -- nothing (free) else would do.

Tips for large XML

I don't have it now but there's a XSLT script that's about 4 lines long that will reformat nasty XML into nicely indented XML that fits on "shorter" lines. Combined with Apache XSLT processor (C version) and you can reformat a 35MB XML file in about 15 seconds.

Yes, I tried using GnuEmacs and VIM and both had a horrible time with a single line with 35MBytes in it.

Recommendations

Some popular large file editors include:


CategoryFeatures


HomePage | RecentChanges | EditorIndex | TextEditorFamilies | Preferences
Edit text of this page | View other revisions
Last edited April 7, 2012 8:09 am (diff)
Search: