Re: email stop words

markspace <markspace@nospam.nospam>
Thu, 21 Mar 2013 09:33:12 -0700
On 3/21/2013 6:24 AM, Eric Sosman wrote:

     Integer count = map.get(word);
     map.put(word, count == null ? 1 : count + 1);

Basically, yes.

... and that you switched to something more like

     Integer count = map.get(word);
     map.put(word, new Integer(count == null
         ? 1 : count.intValue() + 1);

No, I made a Counter with a primitive and a reference to the word:

   Counter counter = map.get( word );
   if( counter == null ) {
     counter = new Counter();
     counter.word = word;
     counter.count = 1;
     map.put( word, counter );
   } else

If so, the slowdown is probably due to increased memory pressure
and garbage collection: `new' actually creates a new object every

Yeah, that's what I thought too. Although since there's only as many
Counters as there are Strings (words), I don't get why just making a 2x
change would slow the system as horribly as it did. There should be
only 4 million Strings and therefore also 4 million Counters. I can't
figure out why that would be a problem.

time, while auto-boxing uses (the equivalent of) Integer.valueOf().
The latter maintains a pool of a couple hundred small-valued Integers
and doles them out whenever needed, using `new' only for un-pooled

I think it would be worth it to change the JVM memory parameters from
the defaults and see if that makes a difference.

Also, any thoughts on the best way to observe a GC that is thrashing?
I'm really curious to pin this down to some sort of root cause. I
couldn't rule out a coding error somewhere either.

     My suggestion would be to implement a Counter class that
wraps a mutable integer value. Then you'd use

Thanks, I'll take a look at this when I get a chance. A good suggestion!

     Or, you could just go back to auto-boxing.

Yes, A-B-A testing works. Going back to auto-boxing restored the
previous run times, so I'm fairly certain it's related to memory
pressure or something similar.

