Re: Serious concurrency problems on fast systems

Robert Klemme <>
Wed, 09 Jun 2010 07:43:55 +0200
On 09.06.2010 07:13, Kevin McMurtrie wrote:

In article<>,
  Patricia Shanahan<> wrote:

Kevin McMurtrie wrote:

To clarify a bit, this isn't hammering a shared resource. I'm talking
about 100 to 800 synchronizations on a shared object per second for a
duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
cause a complete collapse of concurrency.


Have you considered other possibilities, such as memory thrashing? The
resource does not seem heavily enough used for contention to be a big
issue, but it is about the sort of access rate that is low enough to
allow a page to be swapped out, but high enough for the time waiting for
it to matter.


It happened today again during testing of a different server class on
the same OS and hardware. This time it was under a microscope. There
were 10 gigabytes of idle RAM, no DB contention, no tenured GC, no disk
contention, and the total CPU was around 25%. There was no gridlock
effect - it always involved one synchronized method that did not depend
on other resources to complete. Throughput dropped to ~250 calls per
second at a specific method for several seconds then it recovered. Then
it happened again elsewhere, then recovered. After several minutes the
server was at top speed again. We then pushed traffic until its 1Gbps
Ethernet link saturated and there wasn't a trace of thread contention
ever returning.

Did you scrutinize the GC's log? This would be something I definitively
would look into. Other than that it's difficult to come up with
concrete information with such a general problem description.



remember.guy do |as, often| as.you_can - without end

