Re: STL Queue and multithreading

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Thu, 11 Nov 2010 10:46:59 -0800 (PST)

Message-ID:

<f6311a58-68da-425e-865d-5f46b72bd297@v12g2000vbh.googlegroups.com>

On Nov 11, 11:51 am, "Chris M. Thomasson" <cris...@charter.net> wrote:

"James Kanze" <james.ka...@gmail.com> wrote in message

news:b7531a6f-7c92-482c-a24b-342515d918e6@e20g2000vbn.googlegroups.com...

On Nov 10, 10:26 pm, "Chris M. Thomasson" <cris...@charter.net> wrote:

[...]

FWIW, mutex lock/unlock usually costs 2 atomic RMW operations
and 2 memory barriers, one of which may very well be
a #StoreLoad. All of that can be fairly expensive...

Compared to what?

Well, perhaps compared to efficient synchronization algorithms that are
based on plain atomic loads and stores for instance. Various
single-producer/single-consumer algorithms come to mind; it's basically
impossible for a mutex to outperform them... Period.

Certainly. But when is synchronization a bottleneck (except in
cases of contention)?

(And why two atomic operations? I think
I could implement a Mutex lock with a single CAS.)

Nearly all of the mutex implementations I have checked
out/disassembled have two atomic RMW operations. One for
locking, and the other one for unlocking.

Oops. Yes, I misread you. I was just thinking of the lock.
You have to do the same thing when unlocking, of course.

The memory barriers are expensive compared to any single
instruction, but in most programs, you shouldn't require the
lock very often. And they aren't expensive, or even
significant, compared to any other important calculations.

The `#StoreLoad' style memory barrier is fairly expensive. It
forces a strict ordering and basically ruins any accumulated
cache that the executing processor may have had...

I know, but if you're invoking it that often, you probably have
a problem of granularity. (But I'm sure that there are
exceptions.)

It's not a question of the ratio of time: you can probably
synchronize twice as fast as a mutex for some uses. But the
fact remains that the total time in both cases is an
insignificant part of the time of most applications. If I'm
passing a message in a queue, who cares whether it takes 750ns,
or 1.5us, if the calculations behind it take hundreds of
microseconds?

--
James Kanze