Re: Serious concurrency problems on fast systems
On 02.06.2010 07:45, Kevin McMurtrie wrote:
In article<4c048acd$0$22090$742ec2ed@news.sonic.net>,
Kevin McMurtrie<mcmurtrie@pixelmemory.us> wrote:
I've been assisting in load testing some new high performance servers
running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linux is
suspending threads for time-slicing in very unfortunate locations. For
example, a thread might suspend in Hashtable.get(Object) after a call to
getProperty(String) on the system properties. It's a synchronized
global so a few hundred threads might pile up until the lock holder
resumes. Odds are that those hundreds of threads won't finish before
another one stops to time slice again. The performance hit has a ton of
hysteresis so the server doesn't recover until it has a lower load than
before the backlog started.
The brute force fix is of course to eliminate calls to shared
synchronized objects. All of the easy stuff has been done. Some
operations aren't well suited to simple CAS. Bottlenecks that are part
of well established Java APIs are time consuming to fix/avoid.
Is there JVM or Linux tuning that will change the behavior of thread
time slicing or preemption? I checked the JDK 6 options page but didn't
find anything that appears to be applicable.
To clarify a bit, this isn't hammering a shared resource. I'm talking
about 100 to 800 synchronizations on a shared object per second for a
duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
cause a complete collapse of concurrency.
It's the nature of locking issues. Up to a particular point it works
pretty well and then locking delays explode because of the positive
feedback.
If you have "a few hundred threads" accessing a single shared lock with
a frequency of 800Hz then you have a design issue - whether you call it
"hammering" or not. It's simply not scalable and if it doesn't break
now it likely breaks with the next step of load increasing.
My older 4 core Mac Xenon can have 64 threads call getProperty(String)
on a shared Property instance 2 million times each in only 21 real
seconds. That's one call every 164 ns. It's not as good as
ConcurrentHashMap (one per 0.30 ns) but it's no collapse.
Well, then stick with the old CPU. :-) It's not uncommon that moving to
newer hardware with increased processing resources uncovers issues like
this.
Many of the basic Sun Java classes are synchronized. Eliminating all
shared synchronized objects without making a mess of 3rd party library
integration is no easy task.
It would certainly help the discussion if you pointed out which exact
classes and methods you are referring to. I would readily agree that
Sun did a few things wrong initially in the std lib (Vector) which they
partly fixed later. But I am not inclined to believe in a massive (i.e.
affecting many areas) concurrency problem in the std lib.
If they synchronize they do it for good reasons - and you simply need to
limit the number of threads that try to access a resource. A globally
synchronized, frequently accessed resource in a system with several
hundred threads is a design problem - but not necessarily in the
implementation of the resource used but rather in the usage.
Next up is looking at the Linux scheduler version and the HotSpot
spinlock timeout. Maybe the two don't mesh and a thread is very likely
to enter a semaphore right as its quanta runs out.
Btw, as far as I can see you didn't yet disclose how you found out about
the point where the thread is suspended. I'm still curios to learn how
you found out. Might be a valuable addition to my toolbox.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/