Re: byte stream vs char stream buffer

From:

Robert Klemme <shortcutter@googlemail.com>

Newsgroups:

comp.lang.java.programmer

Date:

Sun, 11 May 2014 21:34:47 +0200

Message-ID:

<bta1moFpubcU1@mid.individual.net>

On 11.05.2014 20:50, markspace wrote:

On 5/11/2014 7:02 AM, Robert Klemme wrote:

- Reading in 1k chunks from the application level is already close to
optimal.

Good test suit here all around, Robert.

Thank you!

I just wanted address this one
point quickly. I'm not sure if this is new information or not, but in
actual use, I've found that it helps to allocate large buffers when
you're reading a large text file that must be processed quickly.

Sample:

       ByteBuffer bb = ByteBuffer.allocateDirect( 64 * 1024 );
       RandomAccessFile f = new RandomAccessFile( "blet", "rw" );
       FileChannel fc = f.getChannel();
       readToCapacity( fc, bb );
       bb.flip();
       ByteBuffer b2 = bb.slice();
       b2.asReadOnlyBuffer();

Allocating a direct byte buffer of 64k seems to help text processing
speed by about an order of magnitude. Files that take minutes to
process go to seconds when large direct byte buffers are used. It seems
important to get that first big chunk of bytes in before doing anything
else (like converting to chars).

I seem to recall having done similar tests in the past but found that
overall the benefit for processing speed was negligible because at some
point the data has to be moved into the Java heap. But my memory is
faint here and the situation may have changed in the meantime.

The input file for this was about 600k, so if you're reading much
smaller files the difference might not show up.

This is just a sample, I don't seem to be able to find the original, so
I hope it's accurate. Here's the static method used by the above code:

    public static void readToCapacity( FileChannel fc, ByteBuffer buf )
            throws IOException
    {
       final int capacity = buf.capacity();
       int totalRead = 0;
       int bytesRead = 0;
       while( (bytesRead = fc.read( buf ) ) != -1 ) {
          totalRead += bytesRead;
          if( totalRead == capacity ) break;
       }
    }

The code shown so far has some differences to the tests Roedy and I were
doing:

1. You are never transferring those bytes to Java land (i.e. into a byte
or byte[] which is allocated on the heap) - data stays in native land.

2. You are not reading chars and hence also do not do the character
decoding.

You likely did some processing but we cannot see it. But I agree,
another test with nio would be in order to show whether there is more
potential for speedup.

Btw. a memory mapped file may also be an alternative. This can be done
with nio. It does have some downsides (e.g. different behavior on
different OS in my experience, address space) and proved not to be
faster than plain sequential reading with InputStreams in an application
I have been working on.

Kind regards

robert