In article<l6udnc0kra5W-pvRnZ2dnUVZ_qadnZ2d@earthlink.com>,
Patricia Shanahan<pats@acm.org> wrote:
Kevin McMurtrie wrote:
...
To clarify a bit, this isn't hammering a shared resource. I'm talking
about 100 to 800 synchronizations on a shared object per second for a
duration of 10 to 1000 nanoseconds. Yes, nanoseconds. That shouldn't
cause a complete collapse of concurrency.
...
Have you considered other possibilities, such as memory thrashing? The
resource does not seem heavily enough used for contention to be a big
issue, but it is about the sort of access rate that is low enough to
allow a page to be swapped out, but high enough for the time waiting for
it to matter.
It happened today again during testing of a different server class on
the same OS and hardware. This time it was under a microscope. There
were 10 gigabytes of idle RAM, no DB contention, no tenured GC, no disk
contention, and the total CPU was around 25%. There was no gridlock
effect - it always involved one synchronized method that did not depend
on other resources to complete. Throughput dropped to ~250 calls per
second at a specific method for several seconds then it recovered. Then
it happened again elsewhere, then recovered. After several minutes the
server was at top speed again. We then pushed traffic until its 1Gbps
Ethernet link saturated and there wasn't a trace of thread contention
ever returning.
That periodic behavior points to something related to GC.
to see if it could change the behavior. If it could, then
it somewhat verifies that it is related to GC.
JVM (from SUN, IBM and BEA/Oracle).