Re: Serious concurrency problems on fast systems
On 01.06.2010 13:39, Lew wrote:
Kevin McMurtrie wrote:
I've been assisting in load testing some new high performance servers
running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is
suspending threads for time-slicing in very unfortunate locations. For
example, a thread might suspend in Hashtable.get(Object) after a call to
getProperty(String) on the system properties.
Just out of curiosity: How did you find out?
It's a synchronized
global so a few hundred threads might pile up until the lock holder
resumes. Odds are that those hundreds of threads won't finish before
another one stops to time slice again. The performance hit has a ton of
hysteresis so the server doesn't recover until it has a lower load than
before the backlog started.
The brute force fix is of course to eliminate calls to shared
synchronized objects. All of the easy stuff has been done. Some
You call that "brute force" as if it weren't the actual, correct answer.
operations aren't well suited to simple CAS. Bottlenecks that are part
of well established Java APIs are time consuming to fix/avoid.
But necessary. Having repeated calls on system properties that require
synchronization is just plain stupid. System properties are the ones
that don't change during a program run, so they should be (and should
have been) written once into an immutable structure at class-load time
and read thence thereafter. End of synchronization woes for that one.
Is there JVM or Linux tuning that will change the behavior of thread
time slicing or preemption? I checked the JDK 6 options page but didn't
find anything that appears to be applicable.
The jmap/jhat dump utilities have some of that, IIRC. Otherwise you
break into the add-on diagnostic tools.
But really it sounds like your code needs refactoring in that it did not
handle concurrency correctly from jump.
I couldn't agree more to what Lew wrote. If all your threads hammer on
single global resources you've got a design level issue: that
application is simply not built with scalability in mind - even if the
effect did not show up yet with other hardware / different load. This
is nothing you can blame the JVM or hardware for.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/