Re: C++ Threads, what's the status quo?

From:

"James Kanze" <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

14 Jan 2007 14:02:41 -0500

Message-ID:

<1168783071.409234.108290@q2g2000cwa.googlegroups.com>

Zeljko Vrba wrote:

On 2007-01-13, JohnQ <johnqREMOVETHISprogrammer@yahoo.com> wrote:

locks? Are you saying that your 'atomic' keyword would allow getting rid of
all the synchronization issues and therefor primitives everywhere all the
time? Enquiring mind wants to know.

Yes, in a certain limited (and very useful!) set of cases.
I'm not sure what you refer to with Volatile<T>, so I'll try
to exemplify it more concretely.

Consider the following small program [it's useless, but enough to illustrate
concepts; there are many _useful_ examples where atomicty guarantee would
be nice].

Like reference counts:-).

int x;

void thread_i() {
   while(1) {
   // do something
   if(--x == 0)
     break;
   // do something else
}

Now, in the situation as it is now, the compiler may generate code for
'--x == 0' as [for those who know x86 ASM, I write it by the side]

   1. load x into register (movl x, %eax)
   2. decrement register (decl %eax)
   3. store register into x (movl %eax, x)
   4. if result == 0 goto exit (jz exit)

More likely, the compiler will put x in a register at the start
of the loop, and never look at memory again. At least, that's
what my compilers do (both for IA-32 and Sparc).

If you have several copies of thread_i, there is a race condition between
steps 1-3. Some architectures, like x86, offer atomic memory decrement.
*IF* the compiler could be forced to generate atomic operation for
'--x == 0', the generated code would look like:

1. [decrement memory location x and set flags] (decl x)
2. if the result was zero, goto exit (jz exit)

Just a nit, but on Intel 32 bits, the operation is only atomic
if the decl instruction is preceded by a lock prefix. One
instruction or not, the processor still has to read the value
from memory, decrement it and write it back. Which results in a
race condition.

Now 1. is single atomic operation, where the hardware serializes parallel
accesses to memory.

Not according the the specifications I've seen. Even the
original 8086 didn't guarantee the absense of a race condition
unless the lock prefix was used.

Note too that the IA-32 is probably the only widespread
architecture today that supports incrementation and
decrementation of a memory value in a single instruction.

No race conditions are possible. Yes, wrapping x
into something Volatile<int> x _would_ work

I'm sceptical.

The problem here is that it goes beyond the data. What you need
isn't atomic access to the data, but atomic operations on the
data.

with the disadvantages that

   - implicit wrapping in a mutex is suboptimal if the architecture
     supports atomic operation
   - you have to use inline assembly to code optimized Volatile<int>
     that uses atomic instructions, because you currently have _no way_
    to force the compiler to any particular choice of instructions

If the variable were declared as 'atomic int x;' the compiler would be
obliged to generate the 2nd instruction sequence *OR* report an error
if the operation "--x == 0" is not supported by the target architecture.
Thus, you would not need locks for simple operations that are supported
by the target CPU. On x86, this includes all simple arithmetic and logic
operations. [the wonderful world of CISC :)]

In fact, there is a proposal for some atomic operations;
presumably adding and subtracting a value is amongst them. The
interface is that of a function, but of course, an
implementation is free to make it an inline function using
inline assembler, or some special compiler built-in. And of
course, it is require to work. Most architectures do provide
some sort of low level synchronization primitives so that it can
be made to work (albeit using several machine instructions). Of
course, given the speed of acquiring a non-contested mutex, I'm
not sure that you'd always gain that much.

My motive for proposing the atomic keyword is that someone mentioned
"objects" and bitfields and that C++ doesn't have a "memory model".

It's getting one. (It actually has one. A very simple one,
however, and certainly not sufficient for multithreading.)

In
my opinion, atomic access is a more fundamental concept than "memory model".

You can't define atomic access until you have a memory model.
What does it mean, otherwise?

The example was whether it was safe to concurrently access x and y in
something like:

struct {
[atomic] unsigned x:1;
unsigned y:1;
} z;

On some architectures z.x = 1; cannot be executed atomically and a
multithreaded program would break. If there were allowed atomic in front,
the compiler would either generate the correct code or report an error.

Or more likely, the standard would forbid making bit fields
atomic. (The current proposal implements atomic operations as
functions, which means a pointer or a reference to anything
which will be modified. Since you cannot get a pointer or a
reference to a bit field, the issue doesn't come up.)

I guess the question becomes: what is "everything else" and what did
'atomic' do away with and what is still needed?

"Everything else" are high-level synchronization algorithms: spinlocks,
mutexes, semaphores, etc. The 'atomic' does not "do away", it _adds_ the
guarantee of atomicity of memory operations. These (along with preventing
arbitrary statement reordering) are the basic guarantees needed to write
safe multithreaded code without resorting to inline ASM.

Wouldn't it be nice if you could instead of

--
pthread_mutex_lock(&lk);
x += 3; (x86 ASM: addl $3, x)
pthread_mutex_unlock(&lk);
--

write just 'x += 3;' and be *sure* that the compiler has generated correct
code (ie. used atomic instructions) without overheads of function call,
mutexes, and potential sleeping and kernel entry.

There are two open issues: What about other accesses to the
variable? You typically need some sort of special instructions
in both threads for modifications in one thread to be
necessarily visible in another. (Under Posix, for example, ALL
accesses to x must be protected by the lock.) And the other is
other memory; do you expect all modifications to y which occured
before this point to be effective, or not? (If not, it's of
dubious utility.)

--
James Kanze (Gabi Software) email: james.kanze@gmail.com
Conseils en informatique orient?e objet/
                    Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]