Re: single producer, single consumer

From:

Joshua Maurice <joshuamaurice@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 6 Oct 2009 15:35:15 -0700 (PDT)

Message-ID:

<11018f12-7e51-46d7-86ab-ff32a983f064@x6g2000prc.googlegroups.com>

On Oct 6, 3:10 pm, goodfella <goodfella...@gmail.com> wrote:

On Oct 6, 1:52 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:

On Oct 6, 7:00 am, goodfella <goodfella...@gmail.com> wrote:> Here is=

some code I came up with for a lock free single producer

single consumer algorithm. I have done some tests using GCC with -=

optimizations and all seems well. I would like some constructive
criticism of the code as I am not yet an expert on concurrency.

I'm assuming that reading and writing the value of a pointer is
atomic. Although, I have read on some architectures that is a bad
assumption, so to insure correctness across all platforms, I may have
to wait for C++ atomic types. Any suggestions and criticisms are
welcomed thanks. The code is below.

[Snip code]

As already mentioned, multi-processor systems will probably not run
your code as you intended because of cache coherency. I'd strongly
suggest looking it up and getting some good references on threading.
The short version is that BOTH the reader and the writer threads must
execute some kind of synchronization primitive to have any guarantee
WHATSOEVER on the order that writes become visible to another thread
(where a synchronization primitive might be a full blown lock, acquire
and release memory barriers, read and write memory barriers, etc.).
Ex:

#include <string>
#include <fstream>
using namespace std;

//insert threading library
template <typename callable_t>
void startThread(callable_t callable);

int x = 0;
int y = 0;

struct foo
{ foo(string const& f) : filename(f) {}
    string filename;
    void operator() ()
    { ofstream fout(filename.c_str());
        fout << x << " " << y << endl;
    }

};

int main()
{ startThread(foo("a"));
    startThread(foo("b"));
    startThread(foo("c"));
    startThread(foo("d"));
    x = 1;
    y = 2;

}

This program, on a \single\ execution, can produce 4 files with the
contents (though highly unlikely for this contrived example):
    0 0
    0 2
    1 0
    1 2

This is the cache coherency problem. Let me emphasize again: two
different threads may see the same writes occur in different orders.
On modern machines, there is no single view of main memory. Everyone
has their own local copy because of their own independent caches. One
thread may execute the instructions "load register to x" then "load
register to y", but another thread may see it happen in the opposite
order.

Synchronization instructions make the view consistent when used
properly. Atomic is insufficient. The pointer might have been updated
but what it points to has not been updated. See C++ And The Perils Of
Double Checked Lockinghttp://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_r=

evised.pdf

for an excellent description of modern threading, and common
pitfalls.

This is of course ignoring the other two big "culprits" of making non-
conformant code do what the user does not expect: compiler
optimizations and single core pipeline reorderings / optimizations.

Also, even on a single core single processor machine, you might still
hit related issues due to compiler optimizations and hardware pipeline
optimizations, especially with signal handlers.

The machine I'm testing this on has a dual core processor, and I have
yet to hit a problem like that.

Probably just luck. No one side it's likely to fail. Depending on the
application it could be quite rare. You don't want such a Heisen-bug
to show up only once a month on a customer's machine.

I would expect the process to hang
because the producer would constantly be getting a false value
returned from the push function and the consumer would constantly be
getting a null pointer from the pop function due to the fact that the
pointer value would be stored in their respective cache's and a write
to the pointer in one cache would not propagate to the pointer value
in the other cache.

Hardware, from the perspective of a software programmer, is a black
art. You seem to have a very poor understanding of how modern
operating systems work, and how basic processors and caches work. Just
follow the rules and you should be OK. Here are some examples from
real machines.

Caches are not infinite in size. Eventually they get full and things
must be evicted, aka spilled, to main memory. Generally they use a
heuristic of least recently referenced, which means first in is not
guaranteed to be first out. (To answer your question, surely you have
other processes running on your machine, and swapping between these
processes many times a second means that the cache will be cleared
many times a second from this context switching.)

The reader's cache may contain a stale entry for one but not the
other, which means it'll fetch the new version from main memory for
one but not the other.

The infamous DEC Alpha had a split cache, meaning that (something
like ??) all even addresses went to one cache, and all odd addresses
went to the other cache. This is much more likely to produce effects
like above, as something much more recently referenced will be flushed
to main memory first because that subportion of the cache is full.
(The DEC Alpha also has crazy look ahead, where **x may be the newest
value in main memory, but *x may be a stale value. At least, that's
how it's been explained to me. I have no clue how or why you would
implement that.)

I'd assume there exist some caches which use other heuristics, like
locality heuristics. Maybe some caches only contain ~20 byte units of
main memory, in effect like a look-ahead optimization, pulling data
from main memory before it's required, resulting in possibly stale
data for some data if it's near other used data, but if it's far away
in main memory, then it will be much more likely to be the newest
version of main memory.

I'm sure there are even "weirder" optimizations which will mess with
you if you don't follow the rules. The programming language gives you
a contract, and that contract is both threads must execute the proper
synchronization primitives to give any guarantee of the order that
writes become visible from one thread to the other. The programming
language, hardware, and the operating system fulfill this contract,
and they will do their best to optimize within the confines of this
contract. If you do not follow the contract, then you violated this
contract, and violated assumptions made by the implementation, and in
which case you're SOL.

Are there any guarantees in regard to cache
coherency on certain platforms?

Probably. I don't know offhand.