Re: volatile and sequence points in (GASP) multithreaded programs
On Jul 3, 5:55 am, Dave Rahardja
The fact that there is no standards-driven way to suppress
optimization of access to a variable is unfortunate.
That's certainly the intent of "volatile". But it doesn't help
much where threads are concerned. The standard doesn't (and for
various reasons, can't) define what it means by access---for
that matter, you don't say what you mean either. It leaves it
up to the implementation to provide a useful definition (which
many don't) of access, and the expression of intent.
How do people even begin to write
multithreaded code with C++?
By using the guarantees of the system. The C++ compilers I use
under Solaris claim (guarantee) Posix conformance. Under Linux,
more or less, too. (It's difficult to claim Posix conformance
when the underlying system isn't Posix conform, but recent
versions of Linux, with recent versions of g++, are Posix
conform with regards to threading and the thread primitives.)
And optimization really doesn't have much to do with the issue.
You can break thread safety with some optimizations, but that's
a question of specification: Posix forbids certain optimizations
in certain circumstances, not per se, but by giving a certain
number of guarantees which a Posix conformant system must
respect. And turning off all optimization doesn't guarantee
thread safety either. Given something like:
// Global variables...
int a = 0 ;
int b = 1 ;
// In some thread...
a = 2 ;
b = 3 ;
even without the slightest optimization (the compiler
strictly implementing the semantics of the abstract machine,
with no application of the "as if" rule), another thread can see
a == 0 then b == 3.
IMHO, the intent of volatile would be that if both variables
were declared volatile, then this could not happen. In
practice, however, that's not how modern compilers define an
"access"---at least for Sun CC and g++ (and the current version
of VC++), an access only means the emission of a machine
instruction to read or write the object, and says nothing about
when or even whether the cycles occur on the bus, or when they
become visible to another thread.
Especially embedded code, where OS
synchronization is relatively expensive, and often unnecessary, given a
well-designed sequence of reads and writes?
Compilers for simple embedded processors will typically
(hopefully?) define volatile so that it does something useful.
Simple embedded processors also usually have simple hardware
memory models---not the five or six levels of
pipeline/cache/etc. that a general purpose processor has, so
that a store instruction directly generates a write cycle on the
bus (and thus, that simply not optimizing is sufficient). Never
the less: when communicating between interupt routines and the
main thread, or between processors, you need to study the
documentation of both the compiler and the hardware very, very
So here begins my musings and observations based on compilers I have used:
I suspect "most" compilers (and certainly all of the ones that I've worked
with) will guarantee that reads and writes to a volatile POD variable will
result in an actual machine-code access to that variable's memory locatio=
and in the order shown in the source code (we'll ignore pipelined memory
access for now).
You still haven't defined "access"---does an access have to
access "main memory" (whatever that is), or are bus cycles to
the processor local cache sufficient. And how can you ignore
something as universal as pipelined memory.
Access of a volatile non-POD object does nothing special,
except to bind their calls to member functions marked
I suspect "most" compilers will not reorder variable access
across non-inlined function calls, because "most" compilers do
not perform multi-unit optimization (or such optimizations can
be turned off), and have no idea what the side effects of
those functions are.
I think you're wrong here. Off hand, I don't know of any
compiler that doesn't do multi-unit optimization when requested.
I think even g++ supports this; Sun CC and VC++ certainly do.
I suspect "most" compilers will cause all side-effects of a
non-inlined function to be actually performed (memory actually
changed) by the time the function returns.
That's certainly not the case for any of the Sparc compilers I
use, even without cross-module optimization. As soon as they
can prove that the variable can't be accessed from another
translation unit, they leave it in a register. Intel compilers
are less agressive here, because they have less registers to
play with, and all functions share all of the registers.
In summary, it seems that "most" compilers treat volatile POD
access and non-inlined function calls as observable behavior
relative to the thread it's running on.
That's the key. Relative to the thread it's running on. That's
required by the standard.
Of course all of these suspicions are purely figments of my
imagination, which any compiler vendor is free to ignore.
I hope they address this issue in the next standard.
Threading and atomic access are being studied, and it's a pretty
safe bet that the next version of the standard will define
program behavior in a multithreaded environment.
It certainly would make C++ a multithreaded (and
multiprocessor) language. Besides, most compilers already
support some sort of multithreading facility.
Yes. But not necessarily at the granularity you seem to be
requesting. The ones I know are Posix compliant, but as I said,
Posix says that behavior is undefined unless either 1) only one
thread ever accesses the object, 2) no thread ever modifies the
object, or 3) the correct synchronization primitives (e.g.
pthread_mutex_lock) are used. None of the compilers I know have
any support for lower level synchronization.
James Kanze (GABI Software) email:email@example.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34