Re: C++0x: release sequence

From:

Anthony Williams <anthony.ajw@gmail.com>

Newsgroups:

comp.programming.threads,comp.lang.c++

Date:

Mon, 16 Jun 2008 12:09:06 +0100

Message-ID:

<uabhlhc19.fsf@gmail.com>

"Dmitriy V'jukov" <dvyukov@gmail.com> writes:

On Jun 16, 2:13 pm, Anthony Williams <anthony....@gmail.com> wrote:

Relaxed operations can read values from other threads
out-of-order. Consider the following:

atomic_int x=0;
atomic_int y=0;

Processor 1 does store-release:

A: x.store(1,memory_order_relaxed)
B: y.store(1,memory_order_release)

Processor 2 does relaxed RMW op:
int expected=1;
C: while(!y.compare_swap(expected,2,memory_order_relaxed));

Processor 3 does load-acquire:
D: a=y.load(memory_order_acquire);
E: b=x.load(memory_order_relaxed);

If a is 2, what is b?

On most common systems (e.g. x86, PowerPC, Sparc), b will be 1. This
is not guaranteed by the standard though, since this may not be
guaranteed by NUMA systems.

The problem is that it prohibits usage of relaxed fetch_add in acquire
operation in reference counting with basic thread-safety:

struct rc_t
{
std::atomic<int> rc;
};

void acquire(rc_t* obj)
{
obj->rc.fetch_add(1, std::memory_order_relaxed);
}

This implementation can lead to data races in some usage patterns
according to C++0x. Is it intended?

I'm fairly sure it was intentional. If you don't want data races,
specify a non-relaxed ordering: I'd guess that
std::memory_order_acquire would be good for your example. If you don't
want the sync on the fetch_add, you could use a fence. Note that the
use of fences in the C++0x WP has changed this week from
object-specific fences to global fences. See Peter Dimov's paper
N2633:
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2633.html

Relaxed ordering is intended to be minimal overhead on all systems, so
it provides no ordering guarantees. On systems that always provide the
ordering guarantees, putting memory_order_acquire on the fetch_add is
probably minimal overhead. On systems that truly exhibit relaxed
ordering, requiring that the relaxed fetch_add participate in the
release sequence could add considerable overhead.

Consider my example above on a distributed system where the processors
are conceptually "a long way" apart, and data synchronization is
explicit.

With the current WP, processor 2 only needs to synchronize access to
y. If the relaxed op featured in the release sequence, it would need
to also handle the synchronization data for x, so that processor 3 got
the "right" values for x and y.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL