Re: Double-Checked Locking pattern issue

From:

"Ben Voigt [C++ MVP]" <rbv@nospam.nospam>

Newsgroups:

microsoft.public.vc.language

Date:

Thu, 3 Jan 2008 09:00:17 -0600

Message-ID:

<#DK2KkhTIHA.4440@TK2MSFTNGP06.phx.gbl>

"George" <George@discussions.microsoft.com> wrote in message
news:66ABF0F7-18AA-4E6B-9F9A-512F0DC5914A@microsoft.com...

Thanks for your correction, Ben!

I agree with you. :-)

I understand generally reorder instructions to fully utilize pipeline is a
good idea. But I want to know in my specific case, why swap step 2 and
step 3
is faster? Could you provide more description please?

It may only be faster for some code and not others. It all depends on what
other code is in the function, what parts of the CPU are being used. It may
depend on which model CPU. Also the out-of-order execution engine has to
have some rules for rearranging, probably it doesn't choose the fastest
order 100% of the time but if it reorders in a way that makes more code
faster than slower it is still worthwhile.

So there is no way of knowing, without signing a non-disclosure agreement
with Intel and inspecting the out-of-order logic for one particular piece of
compiled code, to know what order it will be swapped. Even then it might
depend on other things, like what other threads are executing on a
hyperthreading processor.

All that the programmer can rely on is that the CPU may, for its own
reasons, reorder memory reads and writes between memory barriers, as long as
each result is the same, considering only the single thread. When you need
to maintain consistency as seen from threads on other processors, you need a
memory barrier. In VS2005, volatile inserts memory barriers.

Of course, I think none of this applies to multithreads with a single core,
because although memory reads and writes may be reordered, the pipeline
would be fully flushed before a context switch. Only true concurrency, with
multiple CPUs/cores, will see the intermediate states of a reordering.

regards,
George

"Ben Voigt [C++ MVP]" wrote:

"George" <George@discussions.microsoft.com> wrote in message
news:D8AE73B1-06F4-4A4B-9986-11DF04C9165B@microsoft.com...

Thanks Ben,

I understand generally how pipeline works. :-)

But I am not sure how (if for the purpose of pipeline, compiler do the
re-ordering) the general rules applies to my specific case. Any ideas?

Out of order execution is reordering in the CPU, not the compiler, to
make
more efficient use of the pipeline.

regards,
George

"Ben Voigt [C++ MVP]" wrote:

"George" <George@discussions.microsoft.com> wrote in message
news:AFF9AD5D-1564-4099-AA56-C36A23EA124B@microsoft.com...

Thanks Igor,

to memory is expensive, it is conceivable the CPU may reorder the
write
to the pointer as early as possible.

The only reason I could think of is writing earlier to memory could
save
the
register and we could save the register for later use.

What are your points about why writing early will improve
performance?
Could
you show more description or some pseudo code please?

You aren't considering pipelining, cache effects, speculative
branching,
or
any of the other things that new CPUs use to retire multiple
instructions
per clock.

In short, while some CPU can retire four instructions per clock, there
aren't four copies of every unit, and certainly it can't transfer that
much
to/from memory at once. So the CPU reorders instructions to get
different
instructions that use different parts of the CPU executing together.
In
essence, this is what hyperthreading also does, except it interleaves
a
separate flow of execution instead of reordering a single flow.

All of this is part of the reason that function calls are so very
expensive.

regards,
George