Re: Do you use a garbage collector?

From:

Jerry Coffin <jcoffin@taeus.com>

Newsgroups:

comp.lang.c++

Date:

Thu, 10 Apr 2008 20:14:28 -0600

Message-ID:

<MPG.226875e660c8657b989c6f@news.sunsite.dk>

In article <KKqdncjvUO8DJGPanZ2dnUVZ_sqinZ2d@comcast.com>,
cristom@comcast.net says...

[ ... ]

You are incorrect. Each call into "new" will "most-likely" be "fast-pathed"
into a "local" cache. On multi-threaded systems, this "cache-hit" can even
be accomplished without using _any_ interlocked RMW or memory barrier
instruction(s). That is, the cache would be per-thread. Some may think that
if a cache is per-thread then a thread will not be able to free something
unless it allocated it. This is false line of thinking. There are different
algorithms that can solve this, one of them can be found here:

http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a04ee855

Hoard and StreamFlow also use per-thread heaps.

Most of the problem seems to be that people think in terms of a heap
having a single free list, so all other threads have to contend for
access to that free list.

A trivially simple design is for each thread to have its own heap, and
each heap to have one free list for every other thread in the system.
Each other thread can then free blocks to any heap with minimal
possibility of contention (specifically, with the thread whose heap it's
freeing memory TO, and then only if it happens while that thread is
grabbing that free list to coalesce its memory with the other free
blocks.

This contention can normally be reduced to a single atomic exchange to
free memory and another to grab memory for coalescing. Specifically,
consider a structure something like:

struct free_block {
    void *next;
    size_t size;
    size_t origin_threadID;
    char payload[0];
};

Technically, that's not legal in C++, but I supect you get the idea.
Let's also assume each thread's heap has a vector of free-lists,
something like:

typedef std:vector<free_block *> free_list;

And we have a vector of free lists, one per thread:

std::vector<free_list> free_lists;

In that case, freeing a block looks something like:

void free_a_block(void *block) {
    free_block *f = ((char *)block)-sizeof(free_block);
    f->next = f;
    size_t heap = f->origin_threadID;
    size_t my_threadID = get_current_thread_ID();

    atomic_exchange(f->next, free_lists[heap][my_thread_ID]);
}

I.e. we put this block's address into its own 'next' field, and then
exchange that with the pointer to the head of our free list in the
appropriate heap.

If you want to eliminate even the last vestige of a wait, you can go one
step further: have each free list contain _two_ free-list heads instead
of one. At any given time, the owning thread may be using only one of
those. You use a mutex that waits on both, and one of them will always
be free. You update it, and free the mutex. The design isn't lock free,
but it guarantees that you'll never have to wait for the lock to take
place -- both mutexes will be free most of the time, but at least one
will be all the time. The wait for the mutex just determines _which_ one
is free right now.

Of course, in reality that design is generally overkill -- the chances
that all threads will be freeing memory on a single thread's heap at any
given time is minmal to say the least. If we don't mind introducing the
possibility of some waiting, it's pretty trivial to automatically
balance between speed and memory usage -- we start with a single free
list in each heap (and a mutex associated with each free list).

When we want to free a block, we wait on the set of mutexes for a heap,
with a fairly short timeout. If the wait succeeds, we free the memory
and go on our way. If the wait fails, we signal the thread that a wait
failed. After waits fail some number of times, the thread reacts by
increasing the number of available free lists. Both the length of the
wait and the number of failures can be adjusted to balance between
performance and memory usage.

--
    Later,
    Jerry.

The universe is a figment of its own imagination.