Roedy Green wrote:
I have written quite a bit of code that loads an entire file into RAM
then does indexOf, substring, regexes etc, working on the giant
String, often creating a new giant String and writing it out.
I wondered if anyone had developed some sort of package to allow such
code to be easily transformed to work on arbitrarily large text files,
e.g. 10 gigabytes.
Since a String can hold no more than two gigachars, such
a package couldn't be a drop-in replacement for your current
techniques. You'll need to divide the file into chunks, and
your code will need to deal with the cracks between them.
Using CharSequence instead of String should allow a sort
of "sliding window" to avoid storing the entire file in memory
at once, but CharSequence is also limited to two gigachars:
You can't implement a CharSequence that encompasses the whole
ten-gig file. It could get you some space economy, but you'd
still need to deal with the chunks.
I wonder if a java.nio.MappedByteBuffer might be of use here.