Re: Serious concurrency problems on fast systems

From:

Robert Klemme <shortcutter@googlemail.com>

Newsgroups:

comp.lang.java.programmer

Date:

Mon, 07 Jun 2010 18:44:52 +0200

Message-ID:

<874m08Fib7U1@mid.individual.net>

On 07.06.2010 08:25, Kevin McMurtrie wrote:

In article<4c0c57a7$0$282$14726298@news.sunsite.dk>,
Arne Vajh=F8j<arne@vajhoej.dk> wrote:

On 02-06-2010 01:45, Kevin McMurtrie wrote:

In article<4c048acd$0$22090$742ec2ed@news.sonic.net>,
Kevin McMurtrie<mcmurtrie@pixelmemory.us> wrote:

I've been assisting in load testing some new high performance server=

running Tomcat 6 and Java 1.6.0_20. It appears that the JVM or Linu=

x is

suspending threads for time-slicing in very unfortunate locations. =

For

example, a thread might suspend in Hashtable.get(Object) after a cal=

l to

getProperty(String) on the system properties. It's a synchronized
global so a few hundred threads might pile up until the lock holder
resumes. Odds are that those hundreds of threads won't finish befor=

another one stops to time slice again. The performance hit has a to=

n of

hysteresis so the server doesn't recover until it has a lower load t=

han

before the backlog started.

The brute force fix is of course to eliminate calls to shared
synchronized objects. All of the easy stuff has been done. Some
operations aren't well suited to simple CAS. Bottlenecks that are p=

art

of well established Java APIs are time consuming to fix/avoid.

Is there JVM or Linux tuning that will change the behavior of thread=

time slicing or preemption? I checked the JDK 6 options page but di=

dn't

find anything that appears to be applicable.

To clarify a bit, this isn't hammering a shared resource. I'm talkin=

about 100 to 800 synchronizations on a shared object per second for a=

duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn'=

cause a complete collapse of concurrency.

But either it does or your entire problem analysis is wrong.

My older 4 core Mac Xenon can have 64 threads call getProperty(String=

)

on a shared Property instance 2 million times each in only 21 real
seconds. That's one call every 164 ns. It's not as good as
ConcurrentHashMap (one per 0.30 ns) but it's no collapse.

That is a call per clock cycle.

HotSpot has some (benchmark-driven?) optimizations for this case. It's=

hard to not hit them when using simple tests on String and
ConcurrentHashMap.

What exactly do you mean by that? I can't seem to get rid of the
impression that you are doing the second step (micro optimization with
JVM internals in mind) before the first (proper design and implementation=
).

?!?!

Many of the basic Sun Java classes are synchronized.

Practically only old ones that you should not be using anymore
anyway.

Properties is a biggie. A brute-force replacement of Properties caused=

the system throughput to collapse to almost nothing in Spring's
ResourceBundleMessageSource. There's definitely a JVM/OS problem. The=

next test is to disable hyperthreading.

As someone else (Lew?) pointed out it's a bad idea to always go to
System.properties. You should rather be evaluating them on startup and
initialize some other data structure - if only to not always repeat
checking of input values over and over again.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/