Re: Synchronization Algorithm Verificator for C++0x

From:

"Chris M. Thomasson" <no@spam.invalid>

Newsgroups:

comp.programming.threads,comp.lang.c++

Date:

Tue, 5 Aug 2008 20:22:18 -0700

Message-ID:

<4K8mk.6560$3l5.4555@newsfe06.iad>

"Dmitriy V'jukov" <dvyukov@gmail.com> wrote in message
news:8488efa1-9383-4232-b5a8-482852cee1f3@a1g2000hsb.googlegroups.com...
On 6 ???, 02:54, "Chris M. Thomasson" <n...@spam.invalid> wrote:

I would be interested to see how long it takes to detect the following
bug
you found in an older version of AppCore:

http://groups.google.com/group/comp.programming.threads/browse_frm/th...

Ok, let's see. I've just finished basic mutex and condvar
implementation.

Can you e-mail the new version to me please?

Here is eventcount:

class eventcount
{
[...]
void wait(unsigned cmp)
{
  unsigned ec = count($).load(std::memory_order_seq_cst);
  if (cmp == (ec & 0x7FFFFFFF))
  {
   guard.lock($);
   ec = count($).load(std::memory_order_seq_cst);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   I don't think that you need a memory barrier under the lock because the
only mutation which will make the following compare succeed is also
performed under the same lock. The signal-slow path is under the lock, and
the waiter slow-path is under the lock. However, I don't know if this
violates some rule in C++ 0x.

   if (cmp == (ec & 0x7FFFFFFF))
   {
    waiters($) += 1;
    cv.wait(guard, $);
   }
   guard.unlock($);
  }
}
};

Here is the queue:

[...]

Here is the test:

[...]

It takes around 40 minutes.

Let's run it!
Here is output (note error detected on iteration 3):

struct spsc_queue_test
DEADLOCK
iteration: 3

Cool; it sure does detect the race-condition in the old AppCore! Very nice
indeed.

[...]

Operations on condition variables are not yet in execution history.

Let's replace 'ec.signal_relaxed()' with 'ec.signal()'. Here is the
output:

struct spsc_queue_test

[...]

iterations: 1000000
total time: 2422
throughput: 412881

Test passed. Throughput is around 400'000 test executions per second.

Beautiful. My eventcount algorihtm works!

=^D

Also test reveals some errors with memory fences.
In signaling function you are using acquire fence in cas, but relaxed
is enough here:
void signal_impl(unsigned cmp)
{
  if (cmp & 0x80000000)
  {
  guard.lock($);
  while (false == count($).compare_swap(cmp,
    (cmp + 1) & 0x7FFFFFFF, std::memory_order_relaxed));
    unsigned w = waiters($);
    waiters($) = 0;
    guard.unlock($);
    if (w)
       cv.notify_all($);
    }
}

Right. The current public AppCore atomic API is not fine-grain enough to
allow for relaxed operations. I need to change that.

Also. Following naive approach doesn't work (at least in C++0x):
void signal()
{
std::atomic_thread_fence(std::memory_order_seq_cst);
signal_relaxed();
}

Good thing the MFENCE works in x86!

You have to execute sequentially consistent atomic RMW on 'ec'
variable here. Because sequentially consistent fence and sequentially
consistent atomic RMW operation doesn' t synchronize-with each other.
Hmmm... This is strange. But I can't find anything about this in
draft... Anthony Williams said that committee made some changes to
initial proposal on fences, but they are not yet published...

This is weird. a seq_cst fence does not work with seq_cst RMW!? Do you
happen to know the rational for this? IMVHO, it makes no sense at all... I
must be missing something.

:^/