Re: C++ Threads, what's the status quo?

From:

"James Kanze" <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

14 Jan 2007 14:04:20 -0500

Message-ID:

<1168784010.828648.253880@11g2000cwr.googlegroups.com>

Le Chaud Lapin wrote:

Zeljko Vrba wrote:

int x;

void thread_i() {
   while(1) {
   // do something
   if(--x == 0)
     break;
   // do something else
}

Now, in the situation as it is now, the compiler may generate code for
'--x == 0' as [for those who know x86 ASM, I write it by the side]

   1. load x into register (movl x, %eax)
   2. decrement register (decl %eax)
   3. store register into x (movl %eax, x)
   4. if result == 0 goto exit (jz exit)
Now 1. is single atomic operation, where the hardware serializes parallel
accesses to memory. No race conditions are possible. Yes, wrapping x
into something Volatile<int> x _would_ work with the disadvantages that

   - implicit wrapping in a mutex is suboptimal if the architecture
     supports atomic operation

I like your creativity with new keywords, but note that a user-mode
atomic operation will often obviate a full user-to-kernel-mode
transition.

Most pthread_mutex_lock do not use a user-to-kernel transition
unless there is a conflict.

Also, as hinted by John Q, there will be many cases (probably most)
where there is no choice but to make one thread wait while another
thread operates on a shared global variable, so a compiler-supplied
atomic operation will only get you up to the spin-lock, and after that,
it is back to kernel-mode synchronization primitives.

Which is obviously false. Zeljko showed how to solve the
problem for a single processor system: a thread switch cannot
occur between the two operations. Add a lock prefix for
multiprocessors, and the processor should hold the memory bus
for the total length of the instruction, so no other processor
can access the variable. (I'm not sure that this holds with
more modern versions of the x86 architecture, but it was the
case when I wrote my OS.)

This is actually the motivation of most of my posts: to remind us that
all trickery will ultimately lead to kernel-mode synchronization
primitives.

In which case, you're arguing something which is provably false.
Lock-free non-blocking algorithms are known for some operations,
provided the hardware provides the right instructions. You
don't typically need kernel-mode synchronization for things like
adding or subtracting of atomic integral types. You certainly
don't for Intel, nor for Sparc post version 9. (Earlier
versions of the Sparc architecture did require some sort of
system level synchronization.)

You might care to look at the code in atomic.h in the g++
implementation (which still uses COW for basic_string). As far
as I know, it is correct for current Intel 32 bit architectures.
(Regretfully, their versions of atomic_read and atomic_set are
broken, although they'll work most of the time. The Sparc 32
bit version uses a spin-lock, and is completely broken, since it
can cause hanging because of priority inversion, and the Sparc
64 bit version is missing some membar instructions, and will not
work correctly in the loosest memory ordering models---but most
programs don't use these, so it is probably OK. Which just goes
to show that these things are difficult to get right, not that
it is impossible.)

--
James Kanze (Gabi Software) email: james.kanze@gmail.com
Conseils en informatique orient?e objet/
                    Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]