Re: atomic memory_order with command or with fence

From:

itaj sherman <itajsherman@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Tue, 29 May 2012 17:39:27 -0700 (PDT)

Message-ID:

<8fb0650a-0e30-40e9-a5bf-7026661f1246@6g2000vbv.googlegroups.com>

I will give this one example, where I'm pretty sure I can demostrate
what I mean.
However my question remains whether it's always so.

On May 29, 10:54 pm, Zoltan Juhasz <zoltan.juh...@gmail.com> wrote:

On Sunday, 27 May 2012 17:08:33 UTC-4, itaj sherman wrote:

I'll put this code in functions, to clarify the context of x and r:

template< typename T >
void my_store_release_1( std::atomic<T>& x, T r )
{
   x.store( r, memory_order_release );
}

template< typename T >
void my_store_release_2( std::atomic<T>& x, T r )
{
   std::atomic_thread_fence( memory_order_release );
   x.store( r, memory_order_relaxed );
}

Disclaimer: I am most certainly not an expert on this area, but based
on my current understanding on the topic, I believe these are not
the same. Hopefully someone, who has more experience, will clarify.

A fence or atomic store operation that is marked with
'memory_order_release' introduces inter-thread, happens-before
relationship on store operations that appear before the
'memory_order_release' fence or atomic store operation - given
it is paired with an acquire counterpart.

but in order to syncheronize a release fence with an acquire fence,
you need an
atomic variable and a store on it sequenced after the release fence,
whose
value be read by a load that is sequenced before the acuire fence.

standard 29.8-p2:
  A release fence A synchronizes with an acquire fence B if there
exist atomic operations X and Y, both
  operating on some atomic object M, such that A is sequenced before
X, X modifies M, Y is sequenced
  before B, and Y reads the value written by X or a value written by
any side effect in the hypothetical
  release sequence X would head if it were a release operation.

operations X and Y in my code were meant to be x.store and x.load.
And this is why I ordered them inside the function before or after
the fence as I did, deliberately.

Conversely, it introduces no happens-before relationship on
operations that appear after the store / fence marked with
'memory_order_release', in regards their visibility in another
thread.

In this case the fence, marked with 'memory_order_release',
introduces no happens-before relationship on the store of x in
regards of the visibility of the store on x in another thread,
since the store appears after the fence.

Right, it doesn't order x, I didn't mean for it to. The point was for
x to
cause a synchronization (an optional one) on the fences. So that
stores that
were sequenced before the release fence, be certainly visible to loads
that
happen after the acquire fence.

So I can show the following use example, in which I think 1 and 2 are
equivalent.
But I'm looking for an answer whether it is always true.

std::atomic<int> atomic_data( 0 );
std::atomic<int> atomic_flag( 0 ); //change flag to 1 when data can be
read.

//thread#1
int data;
std::cin >> data;
atomic_data.store( data, memory_order_relaxed );
my_store_release_XXX( atomic_flag, 1 ); //XXX is one of the above
versions

//thread#2
int const current_flag = my_load_acquire_XXX( atomic_flag );
int const current_data = atomic_data.load( memory_order_relaxed );
if( flag == 1 ) {
  //the atomic_flag store_release synchronizes with load_acquire
  //therefor the atomic_data store happens before the load.
  std::cout << "data arrived " << current_data; //must be what came in
std::cin
} else {
  //no certain synchronization
  std::cout << "no flag for data arrived "; //data maybe 0, maybe
already changed.
}

so, I expect we should agree without explanation that when using
my_store_release_1/my_load_acquire_1 this example works as expected.
(per standard 1.10).

Now regarding 29.8-p2, I assert that using my versions
my_store_release_2/my_load_acquire_2
this should work just the same, just in this example, because the code
would convert to:

//inlining the functions of versions 2:

//thread#1
int data;
std::cin >> data;
atomic_data.store( data, memory_order_relaxed );
std::atomic_thread_fence( memory_order_release ); // <-- fence A
atomic_flag.store( 1, memory_order_relaxed ); // <-- store operation X

//thread#2
int const current_flag = atomic_flag.load( memory_order_relaxed ); //
<-- load operation Y
std::atomic_thread_fence( memory_order_acquire ); // <-- fence B
int const current_data = atomic_data.load( memory_order_relaxed );
if( flag == 1 ) {
  //in this case, the value of flag implies that fence A synchronized
with fence B per 29.8-p2
  std::cout << "data arrived " << current_data; //must be what came in
std::cin
} else {
  //no certain synchronization
  std::cout << "no flag for data arrived "; //data maybe 0, maybe
already changed.
}

I will also assert that it will also work (in this example) when
changing just one
of the functions version, and thus mixing my_store_release_1/
my_load_acquire_2 or
my_store_release_2/my_load_acquire_1.

But this example is just one case, I want to know whether they are
always equivalent.

I believe if you write:

template< typename T >
void my_store_release_3( std::atomic<T>& x, T r )
{
x.store( r, memory_order_relaxed );
std::atomic_thread_fence( memory_order_release );

}

Then 1 and 3 are equivalent, as far as the introduced inter-thread
happens-before relationship is concerned.

The same I would ask about load and acquire:
template< typename T >
T my_load_acquire_1( std::atomic<T>& x )
{
   T const r = x.load( memory_order_relaxed );
   std::atomic_thread_fence( memory_order_acquire );
   return r;
}

template< typename T >
T my_load_acquire_2( std::atomic<T>& x )
{
   T const r = x.load( memory_order_acquire );
   return r;
}

Situation is similar here, the "memory_order_acquire" does not
impose happens-before relationship on the load to x, since the
load appears before the fence.

The correct way is:

template< typename T >
T my_load_acquire_3( std::atomic<T>& x )
{
        std::atomic_thread_fence( memory_order_acquire );
        T const r = x.load( memory_order_relaxed );
        return r;

}

On the other hand, I don't see that it would work with your version 3.
It actually seems like a counter example.

//inlining the functions of versions 3:

//thread#1
int data;
std::cin >> data;
atomic_data.store( data, memory_order_relaxed );
atomic_flag.store( 1, memory_order_relaxed ); // <-- store operation X
std::atomic_thread_fence( memory_order_release ); // <-- fence A

//thread#2
std::atomic_thread_fence( memory_order_acquire ); // <-- fence B
int const current_flag = atomic_flag.load( memory_order_relaxed ); //
<-- load operation Y
int const current_data = atomic_data.load( memory_order_relaxed );
if( flag == 1 ) {
  //it might be possible to load the value of store operation X even
when fence A did not occur yet.
  //in such a case, it is uncertain what value of atomic_data is
loaded.
  std::cout << "data arrived " << current_data;
} else {
  std::cout << "no flag for data arrived ";
}

itaj

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]