Re: Compiler ordering barriers in C++0x

From:

Szabolcs Ferenczi <szabolcs.ferenczi@gmail.com>

Newsgroups:

comp.lang.c++.moderated,comp.lang.c++

Date:

Mon, 5 May 2008 12:35:31 CST

Message-ID:

<7e2dd9ea-7a1b-42a9-8e4a-c3bb91aa6d20@27g2000hsf.googlegroups.com>

On May 5, 3:08 pm, Anthony Williams <anthony_w....@yahoo.com> wrote:

Szabolcs Ferenczi <szabolcs.feren...@gmail.com> writes:

On May 3, 2:13 pm, Anthony Williams <anthony_w....@yahoo.com> wrote:

Szabolcs Ferenczi <szabolcs.feren...@gmail.com> writes:

On May 2, 12:43 pm, Anthony Williams <anthony_w....@yahoo.com> wrote:

[...]
All accesses to shared data MUST be synchronized with atomics: [...]

Can you elaborate this point please. How can you generally synchronise
N processes with help of atomics? Do you mean only two processes under
certain circumstances?

If any thread modifies shared data that is not of type atomic_xxx, the
developer must ensure appropriate synchronization with any other thread that
accesses that shared data in order to avoid a data race (and the undefined
behaviour that comes with that).

It is clear that you must synchronise access to shared variable.
Normally you must use a Critical Region for that.

I was curious how do you synchronise access to shared data with
atomics. Note that atomics only provide this synchronisation for the
access of the atomics themselves but you claimed something like with
atomics you can synchronise access to non-atomic shared data. How? Can
you provide example, please.

If you use acquire/release pairs or seq_cst atomics, then they introduce a
synchronizes-with relation between threads. This in turn introduces a
happens-before relation, which makes the data modified by the thread that did
the store/release visible to the thread that did the load/acquire. This allows
the use of atomics to build mutexes and other synchronization primitives.

std::atomic_flag f=ATOMIC_FLAG_INIT;
std::vector<std::string> shared_data;

void thread_a()
{
     while(f.test_and_set(std::memory_order_acq_rel)); // spin lock
     shared_data.push_back("hello");
     f.clear(std::memory_order_release); // unlock

}

void thread_b()
{
     while(f.test_and_set(std::memory_order_acq_rel)); // spin lock
     if(!shared_data.empty())
     {
         std::cout<<shared_data.front()<<std::endl;
         shared_data.pop_front();
     }
     f.clear(std::memory_order_release); // unlock

}

The clear() from thread_a is a release operation, so synchronizes with the
test_and_set (which is an acquire) in thread_b, which can therefore see the
modification to shared_data, since the push_back happens-before the clear,
which happens-before the test-and-set, which happens-before the accesses in
thread_b.

Thank you for the clarifying code fragment. That makes it clear what
you meant. I think it works and it is correct (if we ignore the timing
issues that if thread_b wins the race you will see no output).

It can be a personal taste but I do not like the way you explain it
since your explanation misses the main issue why it works. I would say
it is not about any `visibility' issues nor is it about so-called
happens-before relation in the first place but there are two factors
that make it work:

1) the atomic test-and-set
2) the waiting loop

These two important issues were not clear from your initial statement:
"All accesses to shared data MUST be synchronized with atomics".
Actually, I would not put it this way either since it is the waiting
loop that synchronises and not the atomic operation alone but the
example is clear at least.

I would say: Based on atomics, you can make a hand built spin lock
and, in turn, from the spin lock you can make a hand built Critical
Region to synchronise access to shared data. Actually, the code
example illustrates this very well, no matter how we explain it.

[...]
Of course, you could also just use a mutex lock or join with the thread doing
the modification.

That is correct. You can synchronise access with mutexes (implementing
a Critical Region by hand).

I think you refer with the "join with the thread" phrase the end of a
structured parallel block where a shared variable becomes a non-shared
one. Again, an example could help. Please give an example illustrating
what you mean.

std::thread t(thread_a); // thread_a from above
t.join();

assert(shared_data.back()=="hello");

This was what I meant as well based on your wording. Thanks for the
clarification.

Best Regards,
Szabolcs

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]