Re: why boost:shared_ptr so slower?
On Aug 23, 6:47 am, Sam <s...@email-scan.com> wrote:
Pete Becker writes:
Juha Nieminen wrote:
Pete Becker wrote:
Juha Nieminen wrote:
Increments and decrements are in no way guaranteed to be atomic,=
and
in some architectures they may well not be. Even if they were, there=
's
still a huge mutual exclusion problem here:
if (! --m_ptr->m_count) {
delete m_ptr;
}
Guess what happens if another thread executes this same code in
between the decrement and the comparison to null in this thread, and=
the
counter happened to be 2 to begin with.
If the decrement is atomic (not an atomic CPU instruction, but atomic=
in
the sense of not tearing and producing a result that's visible to all
threads that use the variable) then this works just fine. Of course, =
all
the other manipulations of this variable must also be similarly atomi=
c.
I don't understand how that can work if the result of the decremen=
t is
not immediately visible to all threads.
Which is why I said "producing a result that's visible to all threads..=
.."
gcc manual, section 5.47, under the description of atomic functions, stat=
es
the following:
In most cases, these builtins are considered a full barrier. That is,=
no
memory operand will be moved across the operation, either forward or
backward. Further, instructions will be issued as necessary to preven=
t the
processor from speculating loads across the operation and from queuin=
g
stores after the operation.
I interpret this as stating that the results of these atomic functions wi=
ll
be immediately visible to all other threads.
(Now, the GCC manual is not as clear as I'd like, so I'm not the most
comfortable posting this, but I think I'm right. Correct me if I'm
wrong.)
I'm not sure if you're misspeaking or actually misunderstanding. When
the term "barrier" is used in the context of threading, the results
are not immediately visible to other threads, nor even visible on the
next matching barrier. Barriers are conditional visibility. Ex:
//static init
a = 0;
b = 0;
int main()
{ //start thread 1
//start thread 2
}
//thread 1
a = 1;
write_barrier();
b = 2;
//thread 2
cout << b << " ";
read_barrier();
cout << a << endl;
Without the barriers, you may see any of the four possible outputs:
0 0
0 1
2 0
2 1
With the barriers in place, this only removes one possible output,
leaving the three possibilities:
0 0
0 1
2 1
The definition of visibility semantics is effectively: "If a read
before a read_barrier sees a write after a write_barrier, then all
reads after that read_barrier see all writes before that
write_barrier."
To nitpick your quote:
I interpret this as stating that the results of these atomic functions wi=
ll
be immediately visible to all other threads.
It is not the case that the write will be immediately visible to all
other threads. Moreso, even if the other thread executes the correct
barrier instruction(s), that write may still not be visible.
If you want guaranteed visibility, use mutexes. However, even a mutex
in one thread does not guarantee that the write becomes immediately
visible to other threads. The other threads still need to execute the
matching "mutex lock" instruction(s).
In other words, for portable C++, and for assembly on most modern
desktop processors, to have any guarantee whatsoever of the order of
visibility of writes from one thread to another, \both\ threads must
each execute a synchronization primitive. No exceptions.