Re: Named shared memory without synchronization
"Dan Schwartz" <Dan Schwartz@discussions.microsoft.com> wrote in
message news:53F9CAA2-C0C9-455B-A246-ED6577070C87@microsoft.com
I have the following scenario:
A primitive (integer) lives in a shared memory page.
Process 1: reads it every once in a while
Process 2: writes to to it fairly often
Since this is an int, my assumption was that this scenario would not
benefit from synchronization, as an int in the C language is defined
(if I'm not mistaken) as the
width of the CPU register, thus being atomic and uncorruptable.
C standard does not say anything about threads or processes, shared
memory, or atomic access (other than in terms of signals, where it
defines sig_atomic_t as the only type guaranteed to provide atomic
access even in the presence of interrupts). It never mentions CPUs or
registers. For example, "register" keyword is defined in C99 standard as
follows: "A declaration of an identifier for an object with
storage-class specifier register suggests that access to the object be
as fast as possible. The extent to which such suggestions are effective
is implementation-defined."
Any reasoning about the behavior of a program in the presence of
multiple threads and such is necessarily specific to a particular
implementation, not the C language in general.
For the record, here's C99 definition of type int: "A ''plain'' int
object has the natural size suggested by the architecture of the
execution environment (large enough to contain any value in the range
INT_MIN to INT_MAX as defined in the header <limits.h>).
Several articles I've seen on the net however, maintained that I need
to declare the primitive as volatile (which makes sense to me) and
aligned to a word boundry (which doesn't make sense to me).
Some processors only support loading a register from a word-aligned
memory address with a single machine instruction. Reading an unaligned
word then requires several instructions - reading two aligned words that
the word of interest overlaps, then doing some shifts and bitwise
operations to bring two halves together in the register. The CPU can of
course be interrupted after reading one half and before reading the
other, which could change right under it.
After implementing the above scenario, it became apparent that the
compiler doesn't need the volatile declaration.
Yes and no. Before VC8, volatile just meant that the compiler could not
cache the value in the register, but had to read it from memory on every
access. However, a compiler can only rarely keep non-local variables in
the registry anyway (e.g. any function call, unless inlined, could
potentially modify a global variable as far as the compiler can tell, so
it has to be conservative).
VC8 gives volatile additional semantics - those of memory barriers.
Modern multiprocessor architectures are weakly ordered: a write by one
CPU may be observed by another CPU with a delay; moreover, writes may be
observed in the order different from one they occured in. Consider:
int x = 0;
int y = 0;
// One CPU repeatedly does
x++;
y++;
// Another CPU does
int xx = x;
int yy = y;
assert(xx >= yy);
One could naively imagine that xx >= yy condition should always hold,
regardless of how the execution of two fragments is interleaved. This
may not be the case, however: it is possible for the second CPU to read
a newer value of y from memory while an older value of x is read from
CPU cache, and end up with xx < yy. To a programmer, it appears as if
writes are observed in the wrong order (which is a useful abstraction to
describe and reason about the phenomenon).
CPUs that exhibit this behavior provide machine instructions called
memory barriers. Writes issued after the barrier are guaranteed to be
observed after the writes issued before the barrier. For more details,
see
http://msdn2.microsoft.com/en-us/library/ms686355.aspx
So again, as of VC8, any read or write to volatile variable issues a
memory barrier, which allows one to avoid certain "interesting" effects
on some SMP architectures. Earlier compiler versions (and many non-MS
compilers) don't do that, so volatile does not help any for mulithreaded
synchronization.
Of course the OS-provided synchronization functions (WaitForSingleObject
et al) and Interlocked* functions issue all the necessary memory
barriers, so one doesn't need to worry about such arcane details when
using them.
The integer seems to
be fetched again for every access.
That wouldn't be the case if you were to, say, read it in a tight loop
without any intervening function calls.
In general, after substantial
testing, I haven't found drawbacks to this no-lock strategy.
That's the problem with multithreaded programming: the bugs have the
tendency not to come up during testing. We are talking extremely subtle
effects here: consider the task of debugging a system that only fails
seemingly randomly, approximately once a week, at the customer's site
when running on a 4-way server under heavy load. This task will be yours
if you don't take the time to understand the issues (or else, switch to
a safer implementation utilizing locks or interlocked instructions).
My question(s):
Is this really a stable solution?
This depends on exact details of what you are doing. From what little
you disclosed, I have my doubts.
Does the compiler know that the integer lives in a shared page and
becomes 'implicitly' volatile?
No. How can it? Besides, in the presence of multiple threads, all the
memory in the process is, in essense, "shared".
If not, does this mean that the same startegy would work for thread
synchronization of an integral global variable?
What strategy? What are you synchonizing? Show some code.
What about alignment, what would be the reason to worry about it for
primitive types?
If you simply declare a variable, it will be given an appropriate
alignment. You need to work hard to produce an unaligned access - e.g.
using #pragma pack(1) on structures, or doing pointer arithmetic on byte
arrays and casting (which would technically exhibit undefined behavior
anyway). Be aware that some CPUs outright crash (raise hardware
exception, to be exact) when asked to read a memory address that's not
properly aligned (see also __unaligned keyword).
--
With best wishes,
Igor Tandetnik
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925