Re: Threading in new C++ standard

From:

"Boehm, Hans" <hans.boehm@hp.com>

Newsgroups:

comp.lang.c++,comp.soft-sys.ace

Date:

Sat, 3 May 2008 16:54:25 -0700 (PDT)

Message-ID:

<5fc1094c-f33a-4d13-b1fc-b8aebf77b7b0@v26g2000prm.googlegroups.com>

On May 3, 2:41 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

On May 2, 10:05 pm, "Boehm, Hans" <hans.bo...@hp.com> wrote:

I assumed a convention here that I should have been clearer about. In
particular, no other threads execute code relevant to this, and this
is the entire code executed in these two threads. That's admittedly a
simplification, but a convenient one that's generally used in
presenting such examples. A more realistic setting whould involve
multiple threads that behave like thread 2, but only a single thread
that sets x_init, and it is set only once and never reset. Very
similar cases occur with double-checked locking, or when passing
"ownership" to an object between threads through some sort of queue.
In the latter case, the queue is implemented using something like
critical regions, but the object is accessed outside of the critical
region, because it's only accessed by one thread at a time.

Thank you for the clarification.

I really did not think that you could mean strictly two processes and
with those strict conditions as well, i.e. latch-behaviour for the
variables. I thought that four decades have passed since it was shown
and commonly agreed that if you have variables with atomic read or
write you cannot derive a general synchronisation between N processes
for critical sections.

I'm not sure what you mean by that. There are certainly solutions,
see for example: http://portal.acm.org/citation.cfm?doid=176454.176479
.. They are probably even somewhat practical with reasonable back-off
strategies. But that's all beside the point: We are not trying to
use atomics to provide mutual exclusion. When you want mutual
exclusion we give you ways to do that. Since this is a
standardization effort, we give you fairly conservative ways to do
that that are widely used inpractice; that's how the process works.

But in practice, not everything is best expressed using mutual
exclusion. Lazy initialization, in various forms, for example is a
common exception. Lock-based (or critical-regotto willing to pay that
when the alternative is much cheaper, and often nearly free.

You mention the so-called double-checked locking as one of the
justification issues for this. The double-checked locking is
considered to be an anti-pattern by a lot of people. Nevertheless,
double-checked locking can be applied only for some latch-like
behaviour. It is mainly used in the implementation of the multi-
threaded version of the singleton pattern but the singleton pattern
itself is an anti-pattern. Even the inventor of the singleton pattern
claims he would never introduce it again as a pattern. Singleton is
considered harmful.

You often want to initialize objects on first use. The debate about
the singleton patternis irrelevant here. The reaon double-checked
locking has a legitimately bad reputation is that it cannot be written
correctly without atomic variables. That doesn't make it useless.
Unfortunately, it is still written incorrectly in large numbers of
cases.

To build a language concept around something questionable and very
limited construction, it does not seem to be a very good idea to me at
least.

Now back to your example:

Thread 1:
x = 42;
x_init = true;

From the programming (language) point of view, what did you express
here? You have expressed that you have two independent actions which
you intend to carry out strictly one at a time and one after the
other. You might had in mind that you mean `x_init' as a flag to
signal the event of assigning a value to `x' but the intention does
not appear in the notation in any way but in the sequencing. (The
sequencing, however, can be `optimised' by your compiler.)

<footnote>
Just for curiosity, in the OCCAM language where there is no semicolon
nor default ordering of actions you should have expressed this as
follows:

SEQ
  x := 42
  x_init := TRUE

  meaning that you want these two actions to happen strictly one after
the other. On the other hand, if you would not care about the
execution order, you put it differently such as:

PAR
  x := 42
  x_init := TRUE

The OCCAM compiler seeing this knows that the two actions can be
carried out either simultaneously or one after another in any order
just because you as the programmer expressed that the order of
execution is don't care. (I have shown the code to illustrate the
combining of the actions only, please note that there is no shared
memory communication in OCCAM.)
</footnote>

Now, you can question whether it is correct for the C/C++ compiler to
override your intention and carry out the two operations in the
reversed order but practically the compilers take the freedom and
assume a single sequential automaton where the two operations can be
swapped. If you consider them together as a unit, the post condition
is really satisfied irrespective of the order of execution. However,
you do not make a program for a single sequential automation any more
so the optimisation become wrong.

Nevertheless, if you move from the one sequential automaton to the
many cooperating sequential automata, you better make your intention
clear and say that you really want to regard the two operations as a
unit. The conventional language means for this is the (Conditional)
Critical Region:

Thread 1:
with (x, x_init) {
  x = 42;
  x_init = true;

}

Thread 2:
with (x, x_init) when (x_init) {
  assert(x == 42);

}

which hides the fact that x = 42 and assert(x == 42) do not need
mutual exclusion. In real cases those tend to be complex object
initializations and accesses, which may be 100 or 1000 times more
expensive than the x_init accesses. Without read-sharing,this
solution is completely impractical when you have multiple readers like
thread 2. If you use something like a rwlock, you are still left with
often a large slowdown slowdown in the reader versus the atomics -
based implementation. You basically have to compile it to one or two
rwlock acquisitions, vs. an ordinary load instruction on X86.

Hans