Re: Am I or Alexandrescu wrong about singletons?

From:

Joshua Maurice <joshuamaurice@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Sat, 27 Mar 2010 15:13:05 CST

Message-ID:

<a3af4262-c1ed-4123-982c-34e92168feeb@u5g2000prd.googlegroups.com>

On Mar 26, 4:05 am, Andy Venikov <swojchelo...@gmail.com> wrote:

James Kanze wrote:

On Mar 25, 7:10 pm, George Neuner <gneun...@comcast.net> wrote:

On Thu, 25 Mar 2010 00:20:43 CST, Andy Venikov

[...]

As you noted, 'volatile' does not guarantee that an OoO CPU will
execute the stores in program order ...

Arguably, the original intent was that it should. But it
doesn't, and of course, the ordering guarantee only applies to
variables actually declared volatile.

for that you need to add a write fence between them. However,
neither 'volatile' nor write fence guarantees that any written
value will be flushed all the way to memory - depending on
other factors - cache snooping by another CPU/core, cache
write back policies and/or delays, the span to the next use of
the variable, etc. - the value may only reach to some level of
cache before the variable is referenced again. The value may
never reach memory at all.

If that's the case, then the fence instruction is seriously
broken. The whole purpose of a fence instruction is to
guarantee that another CPU (with another thread) can see the
changes. (Of course, the other thread also needs a fence.)

Hmm, the way I understand fences is that they introduce ordering and not
necessarily guarantee visibility. For example:

1. Store to location 1
2. StoreStore fence
3. Store to location 2

will guarantee only that if store to location 2 is visible to some
thread, then the store to location 1 is guaranteed to be visible to the
same thread as well. But it doesn't necessarily guarantee that the
stores will be ever visible to some other thread. Yes, on certain CPUs
fences are implemented as "flushes", but they don't need to be.

Well yes. Volatile does not change that though. Most of my
understanding comes from
   http://www.mjmwired.net/kernel/Documentation/memory-barriers.txt
and
   The JSR-133 Cookbook for Compiler Writers
   http://g.oswego.edu/dl/jmm/cookbook.html
(Note that the discussion of volatile in the above link is for Java
volatile 1.5+, not C and C++ volatile.)

I'm not the most versed on this, so please correct me if I'm wrong. As
an example:

main thread:
a = 0
b = 0
start thread 2
a = 1
write barrier
b = 2

thread 2:
print b
read barrier
print a

Without the read and write memory barriers, this will print any of the
4 possible combinations:
0 0, 2 0, 0 1, 2 1

With the barriers, it removes one possible:
0 0, 0 1, 2 1

As I understand "read" and "write" barriers (which are a subset of
"store/store, store/load, load/store, load/load", the semantics are:
"If a read before the read barrier sees a write after the write
barrier, then all reads after the read barrier will see all writes
before the write barrier." Yes, the semantics are conditional. It does
not guarantee that a write will ever become visible. However, volatile
will not change that. If thread 2 prints b == 2, then thread 2 will
print a == 1, volatile or no volatile. If thread 2 prints b == 0, then
thread 2 can print a == 0 or a == 1, volatile or no volatile. For some
lock free algorithms, these guarantees are very useful, such as making
double checked locking correct. Ex:

T* get_singleton()
{
   //all static storage is zero initialized before runtime
   static singleton_t * p;

   if (0 != p) //check 1
   {
     READ_BARRIER();
     return p;
   }
   Lock lock;
   if (0 != p) //check 2
     return p;
   singleton_t * tmp = new singleton_t;
   WRITE_BARRIER();
   p = tmp;
   return p;
}

If a thread reads p != 0 at check 1 which is before the read barrier,
then it sees the write after the write barrier "p = tmp", and it is
thus guaranteed that all subsequent reads after the read barrier (in
the caller code) will see all writes before the write barrier (from
the singleton_t constructor). This conditional visibility is exactly
what we need in this case, what DCLP really wants. If the read at
check 1 gives us 0, then we do have to use a mutex to force
visibility, but most of the time it will read p as nonzero at check 1,
and the barriers will guarantee correct semantics. Also, from what I
remember, the read barrier is quite cheap on most systems, possibly
free on the x86 (?). (See the JRS Cookbook linked above.) I don't
quite grasp the nuances enough yet to say anything more concrete than
this at this time.

Again, I'm coding this up from memory, so please correct if any
mistakes.

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]