Re: atomically thread-safe Meyers singleton impl (fixed)...

From:

"Chris M. Thomasson" <no@spam.invalid>

Newsgroups:

comp.lang.c++,comp.programming.threads

Date:

Wed, 30 Jul 2008 08:37:49 -0700

Message-ID:

<XR%jk.6770$KZ.1205@newsfe03.iad>

"Anthony Williams" <anthony.ajw@gmail.com> wrote in message
news:ud4kv4fbp.fsf@gmail.com...

"Chris M. Thomasson" <no@spam.invalid> writes:

[...]

the Boost mechanism is not 100% portable, but is elegant in
practice.

Yes. If you look at the whole thread, you'll see a comment by me there
where I admit as much.

Does the following line:

__thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch;

explicitly set `_fast_pthread_once_per_thread_epoch' to zero? If so,
is it guaranteed?

The algorithm assumes it does, but it depends which compiler you
user. In the Boost implementation, the value is explicitly
initialized (to ~0 --- I found it worked better with exception
handling to count backwards).

It uses a similar technique that a certain distributed
reference counting algorithm I created claims:

I wasn't aware that you were using something similar in vZOOM.

Humm, now that I think about it, it seems like I am totally
mistaken. The "most portable" version of vZOOM relies on an assumption
that pointer load/stores are atomic and the unlocking of a mutex
executes at least a release-barrier, and the loading of a shared
variable executes at least a data-dependant load-barrier; very similar
to RCU without the explicit #LoadStore | #StoreStore before storing
into a shared pointer location... Something like:

// single producer thread {
foo* local_f = new foo;
pthread_mutex_t* lock = get_per_thread_mutex();
pthread_mutex_lock(lock);
local_f->a = 666;
pthread_mutex_unlock(lock);
shared_f = local_f;

So you're using the lock just for the barrier properties. Interesting
idea.

Yes. Actually, I did not show the whole algorithm. The code above is busted
because I forgot to show it all; STUPID ME!!! Its busted because the store
to shared_f can legally be hoisted up above the unlock. Here is the whole
picture... Each thread has a special dedicated mutex which is locked from
its birth... Here is exactly how production of an object can occur:

static foo* volatile shared_f = NULL;

// single producer thread {
00: foo* local_f;
01: pthread_mutex_t* const mem_mutex = get_per_thread_mem_mutex();
02: local_f = new foo;
03: local_f->a = 666;
04: pthread_mutex_unlock(mem_mutex);
05: pthread_mutex_lock(mem_mutex);
06: shared_f = local_f;
}

Here are the production rules wrt POSIX:

1. Steps 02-03 CANNOT sink below step 04
2. Step 06 CANNOT rise above step 05
3. vZOOM assumes that step 04 has a release barrier

Those __two__guarantees__and__single__assumption__ ensure the ordering and
visibility of the operations is correct. After that, the consumer can do:

// single consumer thread {
00: foo* local_f;
01: while (! (local_f = shared_f)) {
02: sched_yield();
}
03: assert(local_f->a == 666);
04: delete local_f;
}

Consumption rules:

01: vZOOM assumes that the load from `shared_f' will have implied
data-dependant load-barrier.

BTW, here is a brief outline of how the "most portable" version of vZOOM
distributed reference counting works with the above idea:

http://groups.google.ru/group/comp.programming.threads/msg/59e9b6e427b4a144

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/fe24fe99f742ce6e
(an __execlelent__ question from Dmitriy...)

What do you think Anthony?