Re: Singleton_pattern and Thread Safety

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Mon, 13 Dec 2010 02:44:43 -0800 (PST)

Message-ID:

<88ae426c-807e-438b-863e-ebe0a3f3e1ba@p11g2000vbn.googlegroups.com>

On Dec 11, 4:08 am, Leigh Johnston <le...@i42.co.uk> wrote:

On 11/12/2010 03:40, Joshua Maurice wrote:

On Dec 10, 7:17 pm, Leigh Johnston<le...@i42.co.uk> wrote:

On 11/12/2010 03:12, Joshua Maurice wrote:

Normally I instantiate all my singletons up front
(before threading) but I decided to quickly roll a new
singleton template class just for the fun of it
(thread-safe Meyers Singleton):

namespace lib
{
template <typename T>
class singleton
{
public:
    static T& instance()
    {
        if (sInstancePtr != 0)
            return static_cast<T&>(*sInstancePtr);
        { // locked scope
            lib::lock lock1(sLock);
            static T sInstance;
            { // locked scope
                lib::lock lock2(sLock); // second lock should emit memory barrier here
                sInstancePtr = &sInstance;
            }
        }
        return static_cast<T&>(*sInstancePtr);
    }
private:
    static lib::lockable sLock;
    static singleton* sInstancePtr;
};

template <typename T>
lib::lockable singleton<T>::sLock;
template <typename T>
singleton<T>* singleton<T>::sInstancePtr;
}

Even though a memory barrier is emitted for a specific
implementation of my lockable class it obviously still
relies on the C++ compiler not re-ordering stores across
a library I/O call (acquiring the lock) but it works fine
for me at least (VC++). I could mention volatile but
I better not as that would start a long argument. Roll
on C++0x.

If I'm reading your code right, on the fast path, you
don't have a barrier, a lock, or any other kind of
synchronization, right? If yes, you realize you've coded
the naive implementation of double checked? You realize
that it's broken, right? Have you even read
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
? To be clear, this has undefined behavior according to
the C++0x standard as well.

I am aware of double checked locking pattern yes and this
is not the double checked locking pattern (there is only
one check of the pointer if you look). If a pointer
read/write is atomic is should be fine (on the
implementation I use it is at least).

You've hidden the second check with the static keyword.

Example: Consider:

   SomeType& foo()
   {
     static SomeType foo;
     return foo;
   }

For a C++03 implementation, it's likely implemented with something
like:

   SomeType& foo()
   {
     static bool b = false; /*done before any runtime execution, stored
in the executable image */
     static char alignedStorage[sizeof(SomeType)]; /*with some magic
for alignment */
     if ( ! b)
       new (alignedStorage) SomeType();
     return * reinterpret_cast<SomeType*>(alignedStorage);
   }

That's your double check.

For C++0x, it will not be implemented like that. Instead, it
will be implemented in a thread-safe way that makes your
example entirely redundant.

The problem with the traditional double checked locking
pattern is twofold:

1) The "checks" are straight pointer comparisons and for the second
check the pointer may not be re-read after the first check due to
compiler optimization.

That's not correct. Since there is a lock between the two
reads, the pointer must be reread.

One major problem with both the traditional double checked
locking and your example is that a branch which finds the
pointer not null will never execute any synchronization
primitives. Which means that there is no guarantee that it will
see a constructed object---in the absense of synchronization
primitives, the order of writes in another thread is not
preserved (and in practice will vary on most high end modern
processors).

You've added to the problem by not reading the pointer a second
time. This means that two threads may actually try to construct
the static object. Which doesn't work with most compilers today
(but will be guaranteed in C++0x, I think).

Finally, of course, if the instance function is called from the
constructor of a static object, there's a very good chance that
sLock won't have been constructed. (Unix supports static
construction of mutexes, but as far as I know, Windows doesn't.)

2) The initialization of the pointer may be re-ordered by the CPU to
happen before the initialization of the singleton object is complete.

I think you are confusing the checking issue. I am acquiring
a lock before this hidden check of which you speak is made and
this check is not the same as the initial fast pointer check
so issue 1 is not a problem.

I think you're missing the point that order is only preserved if
*both* threads synchronize correctly. You're lock guarantees
the order the writes are emitted in the writing thread, but does
nothing to ensure the order in which the writes become visible
in other threads.

As far as issue 2 is concerned my version (on VC++ at least) is solved
via my lock primitive which should emit a barrier on RAII construction
and destruction and cause VC++ *compiler* to not re-order stores across
a library I/O call (if I am wrong about this a liberal sprinkling of
volatile would solve it).

I should have stated in the original post that my solution is not
portable as-is but it is a solution for a particular implementation
(which doesn't preclude porting to other implementations). :)

There are definitly implementations on which it will work: any
single core machine, for example. And it's definitely not
portable: among the implementions where it will not work, today,
are Windows, Linux and Solaris, at least when running on high
end platforms.

--
James Kanze