Re: Am I or Alexandrescu wrong about singletons?

From:

Herb Sutter <herb.sutter@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Tue, 30 Mar 2010 21:15:33 CST

Message-ID:

<8fu4r55lqpfce9ufaigu2c6n9obh14vjlq@4ax.com>

On Tue, 30 Mar 2010 05:03:11 CST, Andy Venikov
<swojchelowek@gmail.com> wrote:

Herb Sutter wrote:

Please remember this: Standard ISO C/C++ volatile is useless for
multithreaded programming. No argument otherwise holds water; at best
the code may appear to work on some compilers/platforms, including all
attempted counterexamples I've seen on this thread.

You have an enormous clout on C++ professionals, including myself, so
before permanently agreeing to such an all-encompassing statement allow
me to maybe step back a little and see what it is that's at the core of
this argument. Maybe we're arguing the same point. Or maybe I'm missing
something big in which case I'll be doubly glad to have been shown my
wrong assumptions.

Short answer: Note I deliberately said "Standard" above -- the above
statement is completely true for portable usage. You may get away with
it on some platforms today, but it's nonportable and even the
getting-away won't last.

Slightly longer answer follows:

I understand that volatile never was supposed to be of any help for
multithreaded programming. I don't expect it to issue any memory fences
nor make any guarantees whatsoever about anything thread-related...

Yes, and that's why it can't reliably be used for inter-thread
communication == synchronization.

Yet, on all the compilers I know of (gcc, mingw, MSVC, LLVM, Intel) it
produces just the code I need for my multithreaded programs. And I
really don't see how it wouldn't, given common-sense understanding of
what it should do in single-threaded programs. And I'm pretty sure that
it's not going to change in a foreseeable future.

So my use of volatile maybe not standard-portable, but it sure is
real-life portable.

It's like relying on undefined behavior. UB may happen to do what you
expected, most of the time, on your current compiler and platform.
That doesn't mean it's correct or portable, and it will be less and
less real-life portable on multi-core systems.

Because there was no better hook, volatile was strengthened (in
non-standard ways) on various systems. For example, on MS VC++ prior
to VC++ 2005 (I think), volatile had no ordering semantics at all, but
people thought it was used for inter-thread communications because the
Windows InterlockedXxxx APIs happened to take a volatile variable. But
that was just using volatile as a type system tag to help you not
accidentally pass a plain variable, and a little bit to leverage the
lack of optimizations on volatile -- the real reason it worked was
because you were calling the InterlockedXxx APIs because *those* are
correctly synchronized for lock-free coding.

Even now in VC++ 2005 and later, when volatile was strengthened so
that reads and writes are (almost) SC, to get fully SC lock-free code
in all cases you still have to use the InterlockedXxx APIs rather than
direct reads and writes of the volatile variable. The strengthened
volatile semantics makes that, on that compiler and when targeting
x86/x64, using direct reads and writes is enough to make most examples
like DCL work, but it isn't enough to make examples like Dekker's work
-- for Dekker's to work correctly you still have to use the
InterlockedXxx APIs.

Here's the point of view I'm coming from.
Imagine that someone needs to implement a library that provides certain
multithreading (multiprogramming) tools like atomic access,
synchronization primitives and some lock-free algorithms that will be
used by other developers so that they wouldn't have to worry about
things like volatile. (Now that boost.atomic is almost out, I'll happily
use it.

Important note: Using std::atomic<> is exactly the correct answer!

The only caveat is that it's not yet widely available, but this year
we're getting over the hump of wide availability thanks to Boost and
others.

But Helge Bahmann (the author of the library) didn't have such a

Isn't it Anthony Williams who's doing Boost's atomic<> implementation?
Hmm.

luxury, so to make his higher-level APIs work he had to internally
resort to low-level tools like volatiles where appropriate.)

Of course, sure. The implementation of std::atomic<> on any given
platform needs to use platform-specific tools, including things like
explicit fences/membars (e.g., mf+st.rel on IA64), ordered APIs (e.g,.
InterlockedIncrement on Windows), and/or other nonstandard and
nonportable goo (e.g., platform-specific variants of volatile).

The implementation of any standard feature typically will internally
use nonstandard system-specific features. That's the standard
feature's purpose, to shield users from those details and make this
particular system do the right particular thing.

[...]

Look at line D5: it needs to check if Q->Head is still the same as what
we read from it before. Otherwise two possibilities for breaking the
correctness arise: 1) it would be possible for the element pointed to by

[...]

This piece of pseudo code could be naively translated to a following c++
code:

while (true)
{
Node * localHead = head_;
Node * localTail = tail_;
Node * localNext = localHead->next;
if (localHead == head_)
{
...
}

But it wouldn't work for the obvious reasons.
One needs to insert MemoryFences in the right places.

[...]

Fences are evil. Nearly nobody can use them consistently correctly,
including people who have years of experience with them. Those people
(write once, and from then on) use the Linux atomics package or C++0x
std::atomic.

Every mutable shared object should be protected by a mutex (99.9%
case) or be atomic (0.1% case).

If you're going to write lock-free code, it's really, really, really
important to just make the shared variables be C++0x std::atomic<> (or
equivalently Java or .NET volatile, which isn't the same thing as ISO
C and C++ volatile). If you do, you won't have to reason about where
the fences need to go. Reasoning about where the fences need to go is
such a futile and error-prone job that most lock-free papers don't
even try to say where to put them and just assume SC execution.

Here's the final code:

I apologize for not having time to read your transformations of
Maged's code closely, but in all of the following, why is the volatile
on the Node, not on the pointer? Even if volatile did all the magic
you want it to do (like Java/.NET volatile), that's broken because
it's in the wrong place, isn't it? Of course, the usual manifestation
of the problem is that the code will compile, run, and appear to
work...

struct Node
{
   <unspecified> data;
   Node volatile * pNext;
};
Node volatile * volatile head_;
Node volatile * volatile tail_;

dequeue()
{
  while (true)
  {
    Node volatile * localHead = head_;
    Node volatile * localTail = tail_;
    DataDependencyBarrier();
    Node volatile * localNext = localHead->next;

    if (localHead == head_)
    {
     ...
    }
....
}

Now this code will produce the intended correct object code on all the
compilers I've listed above and on at least these CPUs: x86, itanium,
mips, PowerPC (assuming that all the MemoryBarriers have been defined
for all the platforms). And without any modifications to the above code.
How's that for portability?

Without even read the code logic and looking for races, I doubt it.

For a detailed analysis of multiple lock-free implementations of a
similar queue example, including an exceedingly rare race that even
under sustained heavy stress on a 24-core system only manifested once
every tens of millions of insertions, see:

Measuring Parallel Performance: Optimizing a Concurrent Qeue
http://www.drdobbs.com/high-performance-computing/212201163

Now, after writing all this, I realize that I could've used a simpler
example - a simple Peterson's algorithm for two threads wouldn't work
without a use of a volatile: the "turn" variable is assigned the same
value as it's being compared to later, so the compiler will omit the "if
turn == x" part in the if statement.

Actually, Dekker's/Peterson's is broken even with VC++ 2008
heavily-strengthened volatile. (Sorry.) To make it correct you have to
store to the flag variable using InterlockedExchange() or similar, not
using a simple write to the flag variable.

Herb

---
Herb Sutter (herbsutter.wordpress.com) (www.gotw.ca)

Convener, SC22/WG21 (C++) (www.gotw.ca/iso)
Architect, Visual C++ (www.gotw.ca/microsoft)

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]