Re: Am I or Alexandrescu wrong about singletons?

From:

Andy Venikov <swojchelowek@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Wed, 31 Mar 2010 16:35:53 CST

Message-ID:

<hp0avi$nlt$1@news.eternal-september.org>

James Kanze wrote:
<snip>

I'm not sure I follow. Basically, the fence guarantees that the
hardware can't do specific optimizations. The same
optimizations that the software can't do in the case of
volatile. If you think you need volatile, then you certainly
need a fence. (And if you have the fence, you no longer need
the volatile.)

Ah, finally I think I see where you are coming from. You think that if
you have the fence you no longer need a volatile.

I think you assume too much about how fence is really implemented. Since
the standard says nothing about fences you have to rely on a library
that provides them and if you don't have such a library, you'll have to
implement one yourself. A reasonable way to implement a barrier would be
to use macros that, depending on a platform you run, expand to inline
assembly containing the right instruction. In this case the inline asm
will make sure that the compiler won't reorder the emitted instructions,
but it won't make sure that the optimizer will not throw away some
needed instructions.

For example, following my post where I described Magued Michael's
algorithm, here's how relevant excerpt without volatiles would look like:

//x86-related defines:
#define LoadLoadBarrier() asm volatile ("mfence")

//Common code
struct Node
{
     Node * pNext;
};
Node * head_;

void f()
{
     Node * pLocalHead = head_;
     Node * pLocalNext = pLocalHead->pNext;

     LoadLoadBarrier();

     if (pLocalHead == head_)
     {
         printf("pNext = %p\n", pLocalNext);
     }
}

Just to make you happy I defined LoadLoadBarrier as a full mfence
instruction, even though on x86 there is no need for a barrier here,
even on a multicore/multiprocessor.

And here's how gcc 4.3.2 on Linux/x86-64 generated object code:

0000000000400630 <_Z1fv>:
   400630: 0f ae f0 mfence
   400633: 48 8b 05 fe 09 20 00 mov 0x2009fe(%rip),%rax #
601038 <head_>
   40063a: bf 5c 07 40 00 mov $0x40075c,%edi
   40063f: 48 8b 30 mov (%rax),%rsi
   400642: 31 c0 xor %eax,%eax
   400644: e9 bf fe ff ff jmpq 400508 <printf@plt>
   400649: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)

As you can see, it uselessly put mfence right at the beginning of
function f() and threw away the second read of head_ and the whole if
statement altogether.

Naively, you could say that we could put "memory" clobber in the inline
assembly clobber list like this:
#define LoadLoadBarrier() asm volatile ("mfence" : : : "memory")

This will work, but it will be a huge overkill, because after this the
compiler will need to re-read all variables, even unrelated ones. And
when f() gets inlined, you get a huge performance hit.

Volatile saves the day nicely and beautifully, albeit not "standards"
portably. But as I said elsewhere, this will work on most compilers and
hardware. Of course I'd need to test it on the compiler/hardware
combination that client is going to run it on, but such is the peril of
trying to provide portable interface with non-portable implementation.
But so far I haven't found a single combination that wouldn't correctly
compile the code with volatiles. And of course I'll gladly embrace C++0x
atomic<>... when it becomes available. Right now though, I'm slowly
migrating to boost::atomic (which again, internally HAS TO and IS using
volatiles).

Thanks,
     Andy.

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]