Re: Concurrent bidirectional one-to-many map?

From:

Robert Klemme <shortcutter@googlemail.com>

Newsgroups:

comp.lang.java.programmer

Date:

Mon, 9 May 2011 06:43:39 -0700 (PDT)

Message-ID:

<08586289-8935-4532-93d0-e8c7dd45cb24@c1g2000yqe.googlegroups.com>

On 8 Mai, 05:51, Tom Anderson <t...@urchin.earth.li> wrote:

On Sat, 7 May 2011, Patricia Shanahan wrote:

On 5/7/2011 2:43 AM, Sebastian wrote:
...

To give an example, I'm trying to solve a problem like this:
Associate tasks with workspaces, where a workspace may hold many
tasks,but a task may be associate with at most one workspace.

...

I'd deal with that sort of problem by having a custom data structure
that uses java.util structures in its implementation.

That's an eminently sensible course of action.

The question then is what structures does it use, and how? Sebastian was
quite specific about the behaviour he needs from this custom structure;

Actually I am missing more information about access patterns and maybe
estimations about how big the "too many" side is expected to be (at
least on average).

simply telling him he should use standard structures to build it is not
that much more helpful than telling him he should use classes and methods
to build it.

I say this because this is not - that i can see - one of those cases where
the solution is a simple matter of combining existing parts. The central
problem is actually quite a tricky one; managing a bidirectional mapping
is one thing, doing it in a threadsafe concurrent manner is not that much
harder, nor is making it one-to-many, but i think the combination of
efficient thread safety and one-to-manyness is actually pretty tricky. You
can make it work, and work safely, by using a coarse-grained lock over a
suitable agglomeration of data structures (a map of keys to sets of
values, plus a backwards map of values to keys, say), but that has very
poor concurrency.

I'd be cautious with this statement: as far as I can see we do not
know access patters. If reads far outweigh writes then read write
locking with a global lock works perfectly and has the added advantage
to be quite simple to do as you mentioned already.

It's not obvious to me how to make it work correctly with good
concurrency; the ConcurrentHashMap approach is to stripe the locks, so as
to partition the hashtable into independently-locked domains, but to use
the same set of stripes for the reverse mapping, you would need to be able
to tell which stripe a value belongs to - and that can only be done by
looking it up in the very reverse mapping you are worrying about locking!
Can you have a separate set of stripes for the reverse mapping? Do you
need some multi-phase approach, where you determine the stripes without
locking, then acquire the locks to do the actual lookup or insertion? Is
there some lock-free voodoo which could be used? Is there any mileage in
using a union-find structure to collapse the sets of values into a single
representative which could participate in a 1:1 key-value mapping? I am
certainly not any sort of data structure guru, but i don't have answers to
any of those questions. That makes it a problem worth discussing here, in
my book. Does anyone have any ideas on how to do this?

This could be solved by standard relationship implementations: Make
workspace a member of Task and synchronize accesses on Task. Example:

https://gist.github.com/962538

By using monitors of Task and Workspace we have quite fine granularity
of locking.

Kind regards

robert