Re: Concurrent Data Structures (Was: Concurrent Containers)

From:

"Balog Pal" <pasa@lib.hu>

Newsgroups:

comp.lang.c++

Date:

Fri, 3 Sep 2010 00:23:01 +0200

Message-ID:

<i5p7qi$mci$1@news.ett.com.ua>

"Scott Meyers" <NeverRead@aristeia.com>

On 9/1/2010 3:29 PM, Balog Pal wrote:

Possibly my mind is too petrified to abandon 'you shall lock at least
till the
if() part finishes, but more likely until the whole conditional.

I don't know about petrified, but I think that (1) you are assuming that
the API for a concurrent data structure will look like that for the
current STL data structures (which, as I posted earlier, is perhaps my
fault for wording my subject line poorly; I've modified it above) and (2)
you are assuming that the kinds of operations and combinations of
operations that are common on existing STL containers (i.e., on
concurrency-unfriendly data structures) will also be common on
concurrency-friendly data structures.

Probably so. :) As I mentioned earlier STL's interfaces are especially
hostile for these uses.

But even the other regular interface for 'collection; like stuff -- and even
for simple aggregates -- is bad for general cases.

The examples you gave are okay: mutating operations, like insert are about
good, and may turn useful here and there. The problem is with read
operations. What include all kinds of stuff retrieval, direct or indirect,
from a non-const object. The container has no chance to cover it up.
What is worse, my experience is that too many collegues believe that to have
thread-safe thing is to protect concurrent writes against each other. And
that read can go without lock. (if you recall the mass discussion of the
double-checked nightmare, it is just the surfece.

Misleading names don't help either. In another post you quoted the dox on
..size(). it is unfortunately not called approx_size() or outdated_size()
or bogus_size(), what reflects the semantics, but just size(). That
normally report, well, the count of elements. Firmly and correctly. Just
as its twin IsEmpty() [ empty() in STL :-( ]. For a 'concurrent' thing it
reports bogus info that we in turn use to make decisions. Not good.

As regards API, the TBB API for concurrent_queue includes a try_pop
function, so, as the documentation puts it, what you might write like this
with an STL queue,

  bool b=!q.empty();
  if(b) {
    x=q.front();
    q.pop();
  }

you write like this with a concurrent_queue:

  bool b = q.try_pop(x); // x holds the popped value if there was an
                          // element to pop

Yeah, this looks much better. (Actually at least one of my async_queue
classes has this same interface. )

But if I started fresh to create a multiproducer-singleconsumer queue for
the usual interthread messaging, it would not have pop at all. Only push
for producers, and some functor, that is the consumer, and gets called
whenever there is stuff -- with the already extracted message.

Not looking like a 'collection' at all, but rather like a
manager/dispatcher/...

So your "you shall lock at least till the if() part finishes" advice falls
by the wayside, because there is no if() part.

Good, the point is that the interface must not have a single function that
can be used that way, and be unsafe.

The if and pop are an atomic combination.

Yeah. Though it still miss the most often coupled signaling, and if you have
to make it by hand, there will (too likely) be excess locks and context
switches.

(And my other implementtions do not bother with pop -- but create/pick an
empty queue and swap it after waking on the signal -- then process the
grabbed content at leasure.)

For an example application of a concurrent hash_map, take a look at
http://www.devx.com/cplus/Article/33334/1954, where an engineer from Intel
(in an article designed to make TBB scalability look good) describes how a
concurrent hash map can be used to allow multiple threads to process
different parts of a large text simultaneously. Each thread performs only
inserts, in contrast to the combination of inserts, lookups, and erasures
that you seem to think are always present.

Err, I surely do not think they are always present ;-) When I design an
app, I carefully consider what operations happen where, document it, and do
locking (and other operations) accordingly. That normally makes the
solution *not* general, and not "reusable". Certainly that is not a
requirement, and the implementation itself is not worth mentioning in the
time measure -- the hard part is the design itself.

The hash_map you refer is supposedly a reusable component, from which we
expect an easily discoverable, reasonably safe interface. I hope we agree
that intuitive names for classes and functions are vital. From soemthing
called concurent_hash_map I'd probably not expect to be limited for such
asymmetric use. And a half-page of description on limitations and intended
uses are too rarely representable in just names.

Also, if the component covers my MT work only halfway -- I have to use
explicit synch incantations for at least some operations, then the gain on
the other operations is moot.

Also, if the object does locking that is not evidently showing its spots, it
is easy to create deadlock situations, you only need another mutex, say from
another such hash_map.