Re: atomic counter
On 2011-10-12 20:58:06 +0000, Marc said:
Pete Becker wrote:
On 2011-10-09 12:27:42 +0000, Marc said:
I am trying to make our reference counting implementation thread-safe
(for a very basic definition of thread-safe), which essentially means
making the counter atomic. However, there are many options to atomic
operations, in particular concerning the memory model, and I want to
make sure I get it right.
The operations I use are increment (no return), decrement (and check
if the return value is 0), store (to initialize) and load (to check if
the object is not shared and thus safe to write into).
It looks to me like memory_order_relaxed should be good enough for
this purpose, as I don't see what synchronization would be needed with
the rest of memory, but I may be missing something fundamental there.
Suppose there are two references to the object, in two different
threads. One thread decrements the reference count, then the other
does. If the decrement from the first thread isn't seen by the second
thread, the second thread won't see that the count has become zero, and
the object won't get destroyed. So memory_order_relaxed won't work: you
need to ensure that the result of a decrement is visible to another
thread that also needs to decrement the count.
Uh? I am still calling an atomic decrement function. The standard says:
"Note: Atomic operations specifying memory_order_relaxed are relaxed
with respect to memory ordering. Implementations must still guarantee
that any given atomic access to a particular atomic object be
indivisible with respect to all other atomic accesses to that object."
I thought the memory order was mostly concerned with what happened to
the rest of the memory.
Assuming your interpretation is correct, what is memory_order_relaxed
good for?
Advanced threading. See Alexander Terekhov's message.
Atomic operations get completed without interruption. That ensures that
a different thread doesn't see a value that's not valid. For example,
suppose that storing a pointer takes two bus operations. If the pointer
starts out null, storing a value into it has two steps: store one half
of the pointer, then store the other half. If a context switch occurs
between those two steps, the thread that's switched to might see half
the pointer. Atomic operations ensure that that sort of tearing doesn't
occur.
The other aspect of threaded programming is visibility of changes.
Here's where you have to abandon the single-processor analogies; think
multiple processors. For example, suppose the system has two
processors, and each processor has its own data cache. Each processor
is noodling around with the same variable, so each cache has a copy of
the value of that variable. Writing a new value, even when done
atomically, only directly affects the value in the cache that belongs
to the processor that wrote the value. Unless the new value is copied
to the other processor's cache, the other processor will still see the
old value. memory_order_relaxed says "don't worry, be happy". It's okay
that the values are inconsistent.
It's easier to get the code right when it's sequentially consistent. In
general, unless you can demonstrate that synchronization is a
bottleneck, don't mess with it.
Well yes, of course, I did some experiments (using boost::shared_ptr
or the libstdc++ std::atomic (just a place-holder implementation, I
know)), and the slow-down was unacceptable, which led me to
reimplement it, and the performance hit is acceptable but still
noticable enough that I am not sure about enabling it by default for
MT programs (some programs have several threads that don't share any
ref-counted objects and would pay the price for nothing).
Our main target is x86/x86_64 where as far as I understand (please
correct me if I am wrong) the memory barrier is unavoidable (implied
by any atomic operation), but I am still interested in not penalizing
our users on other platforms if I don't have to.
On the x86 architecture, pretty much everything is sequentially
consistent. So there's no difference in the generated code between
sequentially consistent visibility and any of the others. Which, in
turn, means that code that uses less than sequentially consistent
visiblity and works just fine on x86 systems may fail miserably if it's
just ported to other systems.
And it is intellectually satisfying to understand things ;-)
<g> This is tricky stuff. There's not much out there that describes it
in an approachable manner unless you're an expert on hardware
architecture. One good source is Anthony Williams' book, "C++
Concurrency in Action", from Manning Publications. www.manning.com.
--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]