Re: Subtle difference between C++0x MM and other MMs
On Aug 24, 11:53 pm, Peter Dimov <pdi...@gmail.com> wrote:
On Aug 24, 9:44 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:
I mean not every C++0x implementation of Peterson's algorithm, but
particular implementation which uses store-release, load-acquire + 1
seq_cst fence.
Why do you think that this implementation doesn't work?
I think I see your point. Getting back to
P0: flag[0] = 1
turn = 1
memory_barrier()
while( flag[1] && turn == 1 );
// do nothing
// critical section
...
// end of critical section
flag[0] = 0
P1: flag[1] = 1
turn = 0
memory_barrier()
while( flag[0] && turn == 0 );
// do nothing
// critical section
...
// end of critical section
flag[1] = 0
It's easy to show that P0 and P1 can't block each other forever;
eventually they will agree on the value of 'turn' and one of them will
proceed.
The case where P0 sees flag[1] == 0 and P1 sees flag[0] == 0 is a
classic SC violation example and every reasonable definition of
memory_barrier rules it out.
The interesting case you must have had in mind is the sequence
P1:flag[1] = 1
P1:turn = 0
P0:flag[0] = 1
P0:turn = 1
P0:memory_barrier
Can P0 now see flag[1] == 0? (P1 will later see turn == 1 and enter
the critical section.)
I wonder whether the formal CLR memory model (or even the current
formal x86 memory model) disallows this. (XCHG for turn instead of a
fence should work.)
I think that the C++MM does, if the condition is while( turn == 1 &&
flag[1] ). P0 seeing its own turn=1 doesn't break the release sequence
started by P1:turn=0 because turn=1 is executed by the same thread
(first bullet in 1.10/6). So P1:turn=0 synchronizes-with the read from
'turn' in P0 and ensures that P1:flag[1]=1 is seen.