On 11/12/2010 18:47, Chris M. Thomasson wrote:
"Leigh Johnston"<leigh@i42.co.uk> wrote in message
news:m4adnexxGJ1aMZ7QnZ2dnUVZ8jOdnZ2d@giganews.com...
On 11/12/2010 16:05, Chris M. Thomasson wrote:
"Leigh Johnston"<leigh@i42.co.uk> wrote in message
news:kY-dnahdNL25H57QnZ2dnUVZ7vOdnZ2d@giganews.com...
[...]
Hmm, I think I see why I might need the first barrier: is it due to
loads
being made from the singleton object before the pointer check causing
problems for *clients* of the function? any threading experts care to
explain?
http://lwn.net/Articles/5159
http://mirror.linux.org.au/linux-mandocs/2.6.4-cset-20040312_2111/read_barrier_depends.html
http://groups.google.com/group/comp.lang.c++.moderated/msg/e500c3b8b6254f35
Basically, the only architecture out there which requires a
data-dependant
acquire barrier after the initial atomic load of the shared instance
pointer
is a DEC Alpha...
[...]
Thanks, so in summary my version should work on my implementation
(IA-32/VC++) and probably would work on other implementations except DEC
Alpha for which an extra barrier would be required.
Are you referring to this one:
http://groups.google.com/group/comp.lang.c++/msg/547148077c2245e2
I believe I have kind of solved the problem. I definitely see what you are
doing here and agree that you can get a sort of "portable" acquire/release
memory barriers by using locks. However, the lock portion of a mutex only
has to contain an acquire barrier, which happens to be the _wrong_ type for
producing an object. I am referring to the following snippet of your code:
<Leigh Johnston thread-safe version of Meyers singleton>
___________________________________________________________
static T& instance()
{
00: if (sInstancePtr != 0)
01: return static_cast<T&>(*sInstancePtr);
02: { // locked scope
03: lib::lock lock1(sLock);
04: static T sInstance;
05: { // locked scope
06: lib::lock lock2(sLock); // second lock should emit memory
barrier here
07: sInstancePtr =&sInstance;
08: }
09: }
10: return static_cast<T&>(*sInstancePtr);
}
___________________________________________________________
Line `06' does not produce the correct memory barrier. Instead, you can try
something like this:
<pseudo-code and exception saftey aside for a moment>
___________________________________________________________
struct thread
{
pthread_mutex_t m_acquire;
pthread_mutex_t m_release;
virtual void user_thread_entry() = 0;
static void thread_entry_stub(void* x)
{
thread* const self = static_cast<thread*>(x);
pthread_mutex_lock(&m_release);
self->user_thread_entry();
pthread_mutex_unlock(&m_release);
}
};
template<typename T>
T& meyers_singleton()
{
static T* g_global = NULL;
T* local = ATOMIC_LOAD_DEPENDS(&g_global);
if (! local)
{
static pthread_mutex_t g_mutex = PTHREAD_MUTEX_INITIALIZER;
thread* const self_thread = pthread_get_specific(...);
pthread_mutex_lock(&g_mutex);
00: static T g_instance;
// simulated memory release barrier
01: pthread_mutex_lock(&self_thread->m_acquire);
02: pthread_mutex_unlock(&self_thread->m_release);
03: pthread_mutex_lock(&self_thread->m_release);
04: pthread_mutex_unlock(&self_thread->m_acquire);
// atomically produce the object
05: ATOMIC_STORE_NAKED(&g_global,&g_instance);
06: local =&g_instance;
pthread_mutex_unlock(&g_mutex);
}
return local;
}
___________________________________________________________
The code "should work" under very many existing POSIX implementations. Here
is why...
- The implied release barrier contained in line `02' cannot rise above line
`01'.
- The implied release barrier in line `02' cannot sink below line `03'.
- Line `04' cannot rise above line `03'.
- Line '03' cannot sink below line `04'.
- Line `00' cannot sink below line `02'.
- Lines `05, 06' cannot rise above line `03'.
Therefore the implied release barrier contained in line `02' will always
execute _after_ line `00' and _before_ lines `05, 06'.
Keep in mind that there are some fairly clever mutex implementations that do
not necessarily have to execute any memory barriers for a lock/unlock pair.
Think exotic asymmetric mutex impl. I have not seen any in POSIX
implementations yet, but I have seen them used for implementing internals of
a Java VM...
[...]
Thanks for the info. At the moment I am only concerned with IA-32/VC++
implementation which should be safe. I could add specific barriers to
plan on doing any time soon).