Re: Handling extremely large input files
On Wed, 28 Apr 2010, Hakan wrote:
We need to scan a very big input file
Exactly how big?
to see how many times each date occurs in it. This means that we want to
check the number of times successive strings of the form "20020701",
"20020702" and so on are in it from a given start to end date. The
syntax is European format.
What do you mean by 'successive'? Could you give us a sample of the input
file?
What is the most efficent way to do it? I have tried with 1) a system call to
grep
Could you tell us the exact grep command you run?
and 2) a RandomAccessfile reading each character and moving the file
pointer ahead,
I'm not sure how much buffering that does. You might be better off with a
FileInputStream wrapped in a BufferedInputStream of generous size (or in
fact, wrapped in an InputStreamReader and some buffering somewhere), or
with a memory-mapped file obtained from a NIO FileChannel. Or you might
not.
but none of them runs quickly enough. Another option might be to use a
pattern matching, but then we would still probably have the problems of
searching through most of the file.
As i understand your requirement, you'll have to scan the *entire* file.
What do you mean by "the problems of searching through most of the file"?
tom
--
Basically, at any given time, most people in the world are wasting time.