Re: Locking objects in an array

From:

Daniel Pitts <newsgroup.spamfilter@virtualinfinity.net>

Newsgroups:

comp.lang.java.programmer

Date:

Mon, 18 May 2009 17:53:36 -0700

Message-ID:

<vqnQl.31181$hX2.24644@newsfe19.iad>

Sorry if this is a duplicate, news server problems.
Tom Anderson wrote:

On Thu, 14 May 2009, Daniel Pitts wrote:

Patricia Shanahan wrote:

Daniel Pitts wrote:
....

I was thinking more about not having explicit locks per object, but
instead having a "Lock Request" that puts its region into a specific
data structure. That Lock Request would block as long as the
current region intersects any region currently ahead of it in that
data structure. Once work is complete, that Lock Request can be
removed from that data-structure, unblocking other threads waiting
for that region.

Won't that create unnecessary contention?

Assume all requests are for 2x2 squares, identified by lowest index in
each dimension.

Requests: (0,0) (1,1) (2,2).

The first and third requests do not overlap, and could run in parallel,
but (1,1) has to block until (0,0) is over, and (2,2) would block
because it overlaps (1,1) which is ahead of it.

Yes, I was thinking about that.

It might be possible to have a "blocking" lock mark itself as "waiting
on another". If the latest lock can move past the waiting lock
successfully, then it can preempt that waiting lock.

An alternative, depending on the OPs actual problem, would be collect
the list of regions (and jobs) that need to execute, and create an
optimal ordering of those jobs such that the most can execute at once.

This still may lead to a less than optimal solution if the jobs
themselves have varying time that can't be accounted for in the
ordering process.

Funnily enough, i have exactly the same problem in something i'm working
on in my spare time at the moment. I'm writing a parallel JUnit runner,
which runs the tests in multiple threads (there are existing ways of
doing this, but they're all awful).

One of the features i want to add is the ability to control concurrency
in cases where tests can't safely run in parallel. For example, at work,
we're testing a system that has a 'publish' operation, where you can't
really run two publishes at the same time, so i want to be able to mark
tests as requiring exclusive use of the publishing operation, so the
runner won't try and run two at once.

One thing to consider is the difference between Unit tests and
Integration tests (and the fuzzy line in between). Unit test, if your
code was written properly, will always be safe to run in parallel.
Integration tests rely on external resources that may be limited, so
they might not be parallelizable.

I could do this with a normal locking mechanism, but that hurts
concurrency: if a publishing test is already running, then if another
thread picks up a second one, it will block, and so sit idle until the
first one finishes. As you observed, no good.

With long running tasks, that might not be such a bad thing. An
unscheduled thread takes up very few resources (though not none)

I could do it with a mechanism that scans the test queue and picks the
first test that doesn't need any currently-held locks. This avoids
blocking, avoids the contention problem Patricia described, but does
lead to a risk of starvation or at least queue-jumping - if i'm running
a test which uses publishing, the next one uses publishing and database
copying (another non-parallelisable operation), and there's one after
that just uses database copying, then this approach would run the third
one, which would then stop the second one from running when the first
one finishes.

True, but your eventual goal is that all the tests run, at some point
that second test will get a chance to run.

Currently, i lean towards a mechanism like your lock requests (the code
for which i will be examining in detail at a later date ...), which
would give me safety without starvation, at the expense of concurrency.
In my case, the great majority of tests don't need any exclusive access
at all, so there should always be some other test to run while one of
the publishing or database copying tests is blocked, and so the loss of
some concurrency should not be a big deal.

tom

My approach for your particular use case would probably be to have a
specialized queue which will return the next not-blocked test available.

It might not be perfect, but it probably gets you to 80% of your goal.
The queue "heuristic" can be improved without breaking the abstraction,
so the rest of your code needn't change. That would probably get you
that last 19%

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>