Re: volatile and sequence points in (GASP) multithreaded programs

From:

Dave Rahardja <drahardja_atsign_pobox_dot_com@pobox.com>

Newsgroups:

comp.lang.c++

Date:

Thu, 05 Jul 2007 00:18:25 -0500

Message-ID:

<s6uo83tv78766qe9m2gjs2q2tgl0s9u7j7@4ax.com>

First of all, thank you for all the responses I received on this probably
offtopic rant of mine.

On Tue, 03 Jul 2007 04:54:04 -0700, James Kanze <james.kanze@gmail.com> wrote:

On Jul 3, 5:55 am, Dave Rahardja
<drahardja_atsign_pobox_dot_...@pobox.com> wrote:

[...]

The fact that there is no standards-driven way to suppress
optimization of access to a variable is unfortunate.

That's certainly the intent of "volatile". But it doesn't help
much where threads are concerned. The standard doesn't (and for
various reasons, can't) define what it means by access---for
that matter, you don't say what you mean either. It leaves it
up to the implementation to provide a useful definition (which
many don't) of access, and the expression of intent.

For threading purposes, my definition of access depends on whether I am on a
multiprocessor system. On a uniprocessor, it is sufficient to hit the cache.
On a multiprocessor system, it is necessary to hit shared main memory for
communications to occur.

How do people even begin to write
multithreaded code with C++?

By using the guarantees of the system. The C++ compilers I use
under Solaris claim (guarantee) Posix conformance. Under Linux,
more or less, too. (It's difficult to claim Posix conformance
when the underlying system isn't Posix conform, but recent
versions of Linux, with recent versions of g++, are Posix
conform with regards to threading and the thread primitives.)

This is unfortunate, because many embedded compiler vendors don't guarantee
anything of this nature. Perhaps a call to tech support will help.

And optimization really doesn't have much to do with the issue.
You can break thread safety with some optimizations, but that's
a question of specification: Posix forbids certain optimizations
in certain circumstances, not per se, but by giving a certain
number of guarantees which a Posix conformant system must
respect. And turning off all optimization doesn't guarantee
thread safety either. Given something like:

   // Global variables...
   int a = 0 ;
   int b = 1 ;

   // In some thread...
   a = 2 ;
   b = 3 ;

even without the slightest optimization (the compiler
strictly implementing the semantics of the abstract machine,
with no application of the "as if" rule), another thread can see
a == 0 then b == 3.

Yes. Obviously any compiler guarantees cannot eliminate logic errors or race
conditions built into the code. However, in certain embedded systems it is
often possible to implement race-free, non-interlocked interthread
communications by carefully sequencing shared memory access. For instance, a
high-speed ISR or thread can push values onto a queue while a slower polling
thread can pop values off the queue without requiring synchronization, as long
as we are careful about pointer access, and making certain operations atomic
(usually by momentarily disabling interrupts).

Especially embedded code, where OS
synchronization is relatively expensive, and often unnecessary, given a
well-designed sequence of reads and writes?

Compilers for simple embedded processors will typically
(hopefully?) define volatile so that it does something useful.
Simple embedded processors also usually have simple hardware
memory models---not the five or six levels of
pipeline/cache/etc. that a general purpose processor has, so
that a store instruction directly generates a write cycle on the
bus (and thus, that simply not optimizing is sufficient). Never
the less: when communicating between interupt routines and the
main thread, or between processors, you need to study the
documentation of both the compiler and the hardware very, very
carefully.

I figured that out in a hurry! However, some embedded compiler vendors
notoriously omit the guarantees that are required for such fine-grained
control of access.

So here begins my musings and observations based on compilers I have used:

<offtopic>
I suspect "most" compilers (and certainly all of the ones that I've worked
with) will guarantee that reads and writes to a volatile POD variable will
result in an actual machine-code access to that variable's memory location,
and in the order shown in the source code (we'll ignore pipelined memory
access for now).

You still haven't defined "access"---does an access have to
access "main memory" (whatever that is), or are bus cycles to
the processor local cache sufficient. And how can you ignore
something as universal as pipelined memory.

Pipelined memory introduces latency that must be managed only for physical
access. For instance, on certain PowerPC implementations the order in which
read and write instructions are executed can be reversed in hardware due to
the separate read and write pipelines. Such behavior will usually ruin device
drivers that hit hardware registers, and will require an additional
synchronization operator between instructions that depend on sequenced access
to physical hardware. However, for non-hardware access, pipelined memory
access becomes a nonissue, as the CPU hardware will usually guarantee that the
core will see access "as written", even when the actual hardware operations
are delayed.

Access of a volatile non-POD object does nothing special,
except to bind their calls to member functions marked
volatile.

I suspect "most" compilers will not reorder variable access
across non-inlined function calls, because "most" compilers do
not perform multi-unit optimization (or such optimizations can
be turned off), and have no idea what the side effects of
those functions are.

I think you're wrong here. Off hand, I don't know of any
compiler that doesn't do multi-unit optimization when requested.
I think even g++ supports this; Sun CC and VC++ certainly do.

You're right. I'd have to defeat multi-unit optimization to guarantee my
assumption.

It certainly would make C++ a multithreaded (and
multiprocessor) language. Besides, most compilers already
support some sort of multithreading facility.

Yes. But not necessarily at the granularity you seem to be
requesting. The ones I know are Posix compliant, but as I said,
Posix says that behavior is undefined unless either 1) only one
thread ever accesses the object, 2) no thread ever modifies the
object, or 3) the correct synchronization primitives (e.g.
pthread_mutex_lock) are used. None of the compilers I know have
any support for lower level synchronization.

However, consider this code:

volatile int sharedData = 0;

extern pthread_mutex_t* mutex;

struct MutexLock
{
    MutexLock(pthread_mutex_t* mutex):
    mutex(mutex)
    {
        pthread_mutex_lock(mutex);
    }

    ~MutexLock()
    {
        pthread_mutex_unlock(mutex);
    }

private:
    pthread_mutex_t* const mutex;
};

void threadA()
{
    for (;;)
    {
        {
            MutexLock(mutex);
            sharedData = 1;
        }
        sleep(10); // OS call to cause thread to sleep
    }
}

void threadB()
{
    for (;;)
    {
        {
            MutexLock(mutex);
            if (sharedData == 1)
            {
                break;
            }
        }
        sleep(10);
    }
    // Code continues...
}

Let's say threadA runs in one thread while threadB runs in another. How does
Posix sematincs guarantee that the compiler does not elide the reads or writes
to sharedData? After all, relative to each thread, sharedData cannot possibly
affect the flow of control.

-dr