Re: StringBuffer java.lang.OutOfMemoryError

From:

Thomas Hawtin <usenet@tackline.plus.com>

Newsgroups:

comp.lang.java.help

Date:

Thu, 14 Dec 2006 17:45:48 +0000

Message-ID:

<45818d6f$0$8749$ed2619ec@ptn-nntp-reader02.plus.net>

Tor Iver Wilhelmsen wrote:

Thomas Hawtin <usenet@tackline.plus.com> writes:

It has to copy the n following characters the appropriate distance up
the array (backwards). It will therefore take a length of time roughly
proportional to the length of the buffer.

But that uses System.arrayCopy(), which can be assumed to be highly
optimized compared to the "manual" code equivalent would be.

The absolute amount of time is not at issue. O(n) is about how it
scales, for large n.

Below is a little microbenchmark. It takes a string with a repeating
pattern of nine spaces and a comma. It inserts an extra space before
each comma, using the algorithm further up this thread. The test is done
ten time over with strings of length 10, 100, 1000, 10 000 and 100 000.
The whole thing is the repeated five times, to show up warming up
artifacts. My results are (using the Sun 1.6.0 AMD64 HotSpot, on an
Opteron 148):

402069:634 1926404:1387 19228853:4385 69899554:8360 6989864460:83605
93747:306 119041:345 601348:775 33463101:5784 7012110064:83738
95844:309 121114:348 602081:775 33337512:5773 6982603943:83561
95995:309 111818:334 613319:783 33503032:5788 6967612662:83472
94645:307 119614:345 600026:774 33411182:5780 6982415413:83560

The results are in nanoseconds, with the square root of the time after
the colon. As expected for small n, other factors hold sway over the
O(n^2). For the third, fourth and fifth values, it is pretty accurate -
each square root is about ten times the previous. Perhaps blowing the
cache has some influence in the actual results.

Now, let's rewrite it without mixing source and destination.

             char[] buff = new char[str.length()*11/10];
             int write = 0;
             for (int j=0; j<str.length(); ++j) {
                 char c = str.charAt(j);
                 if (c == ',') {
                     buff[write++] = ' ';
                 }
                 buff[write++] = c;
             }
             new String(buff, 0, write);

I claim this is an O(n) algorithm (I haven't tampered with it to try and
make it look faster - no reusing the buffer, no trying different
permutations of code, no splitting into submethods). The results:

201295:448 928502:963 8793120:2965 64837622:8052 14568830:3816
87391:295 97777:312 190372:436 1141673:1068 13025497:3609
87909:296 87599:295 203727:451 1130767:1063 13002021:3605
84779:291 99787:315 191618:437 1199595:1095 13011692:3607
88198:296 98584:313 186264:431 1218971:1104 13033585:3610

Indeed the times (not the square roots) for the third, fourth and fifth
are around ten times the previous. For 1000 character strings we are
only three or four times faster. For 100,000 character strings we are a
handy fifty times faster than the original algorithm.

(Disclaimer: I've not actually checked the code does what it claims to.)

Tom Hawtin

class Ouch {
     public static void main(String[] args) throws Exception {
         for (int runCt=0; runCt<5; ++runCt) {
             String str = " ,";
             for (int lenCt=0; lenCt<5; ++lenCt) {
                  System.gc();
                  Thread.sleep(100);
                  long start = System.nanoTime();
                  doRun(str);
                  long end = System.nanoTime();
                  long period = end - start;
                  System.out.print(period+":"+(int)Math.sqrt(period)+" ");
                  StringBuilder buff = new StringBuilder(10*str.length());
                  for (int ct=0; ct<10; ++ct) {
                      buff.append(str);
                  }
                  str = buff.toString();
             }
             System.out.println();
         }
     }
     private static void doRun(String str) {
         for (int loopCt=0; loopCt<10; ++loopCt) {
             StringBuilder buff = new StringBuilder(str.length()*11/10);
             buff.append(str);
             for (int j=0; j<buff.length(); j++) {
                 if(buff.charAt(j) == ','){
                     buff.insert(j, ' ');
                     ++j;
                 }
             }
         }
     }
}