Re: C++0x: release sequence
"Dmitriy V'jukov" <dvyukov@gmail.com> writes:
On Jun 16, 2:13 pm, Anthony Williams <anthony....@gmail.com> wrote:
Relaxed operations can read values from other threads
out-of-order. Consider the following:
atomic_int x=0;
atomic_int y=0;
Processor 1 does store-release:
A: x.store(1,memory_order_relaxed)
B: y.store(1,memory_order_release)
Processor 2 does relaxed RMW op:
int expected=1;
C: while(!y.compare_swap(expected,2,memory_order_relaxed));
Processor 3 does load-acquire:
D: a=y.load(memory_order_acquire);
E: b=x.load(memory_order_relaxed);
If a is 2, what is b?
On most common systems (e.g. x86, PowerPC, Sparc), b will be 1. This
is not guaranteed by the standard though, since this may not be
guaranteed by NUMA systems.
The problem is that it prohibits usage of relaxed fetch_add in acquire
operation in reference counting with basic thread-safety:
struct rc_t
{
std::atomic<int> rc;
};
void acquire(rc_t* obj)
{
obj->rc.fetch_add(1, std::memory_order_relaxed);
}
This implementation can lead to data races in some usage patterns
according to C++0x. Is it intended?
I'm fairly sure it was intentional. If you don't want data races,
specify a non-relaxed ordering: I'd guess that
std::memory_order_acquire would be good for your example. If you don't
want the sync on the fetch_add, you could use a fence. Note that the
use of fences in the C++0x WP has changed this week from
object-specific fences to global fences. See Peter Dimov's paper
N2633:
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2633.html
Relaxed ordering is intended to be minimal overhead on all systems, so
it provides no ordering guarantees. On systems that always provide the
ordering guarantees, putting memory_order_acquire on the fetch_add is
probably minimal overhead. On systems that truly exhibit relaxed
ordering, requiring that the relaxed fetch_add participate in the
release sequence could add considerable overhead.
Consider my example above on a distributed system where the processors
are conceptually "a long way" apart, and data synchronization is
explicit.
With the current WP, processor 2 only needs to synchronize access to
y. If the relaxed op featured in the release sequence, it would need
to also handle the synchronization data for x, so that processor 3 got
the "right" values for x and y.
Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL