Re: byte stream vs char stream buffer

From:

markspace <markspace@nospam.nospam>

Newsgroups:

comp.lang.java.programmer

Date:

Sun, 11 May 2014 12:55:27 -0700

Message-ID:

<lkokjk$3rn$1@dont-email.me>

On 5/11/2014 12:34 PM, Robert Klemme wrote:

1. You are never transferring those bytes to Java land (i.e. into a byte
or byte[] which is allocated on the heap) - data stays in native land.

2. You are not reading chars and hence also do not do the character
decoding.

Yes, I knew the data was "mostly ascii" and therefore I didn't have to
do character decoding. An efficient UTF-8 converter shouldn't be much
more complicated however.

I appear to be counting word lengths in the file, I'm not sure why at
this point. Some more found code:

          FileInputStream fins = new FileInputStream( path.toFile() );
          FileByteBufferInputStream fbbins =
                  new FileByteBufferInputStream( fins );

          int charRead;
          HashedHistogram histogram = new HashedHistogram();
          charRead = fbbins.read();
          StringBuilder sb = new StringBuilder();
          while( charRead != -1 )
          {
             if( charRead < 128 && !Character.isWhitespace( charRead ) ) {
                sb.append( (char) charRead );
                charRead = fbbins.read();
             } else {
                histogram.add( sb.toString() );
                sb.delete( 0, sb.length() );
                while( (Character.isWhitespace( (charRead =
fbbins.read() )) ||
                        charRead >= 128) && charRead != -1 )
                {
                   // nothing
                }
             }
          }
          System.out.println( histogram.size() + " words" );
          Entry<Comparable,Integer>[] entries =
histogram.getSortedEntries();
          System.out.println( "Bottom words:" );
          for( int i = 0; i < 20; i++ )
             System.out.println( entries[i].getKey()+",
"+entries[i].getValue() );
          System.out.println( "Top words:" );
          for( int i = entries.length-1; i > entries.length-41; i-- )
             System.out.println( entries[i].getKey()+",
"+entries[i].getValue() );

Kind of ugly, but that's what I have.