Re: multithreading.

From:

"Chris Thomasson" <cristom@comcast.net>

Newsgroups:

comp.lang.c++

Date:

Sun, 30 Mar 2008 18:33:58 -0700

Message-ID:

<nOedncvAJsfHom3anZ2dnUVZ_tWtnZ2d@comcast.com>

"Jon Harrop" <usenet@jdh30.plus.com> wrote in message
news:13uvuuv2uqoqt8a@corp.supernews.com...

Chris Thomasson wrote:

"Jon Harrop" <usenet@jdh30.plus.com> wrote in message
news:13uu0sgjhurvlf9@corp.supernews.com...

You cannot have your cake and eat it: you said that the programmer must
be aware of the low-level details of their data structures in order to
work around the pathological performance problems inherent with
reference
counting. That is a form of fragility. Fragmentation is one specific
example.

Wrong. You completely misunderstood me. I was talking about the
implementers of refcounting. The implementation of reference counting
should try and avoid making calls into atomic instructions and memory
barriers. This is transparent to the user.

A data structure implementor using reference counting must be aware of the
overhead of reference counting and work around it by amortizing reference
counts whenever possible. You yourself only just described this in the
context of performance.

A data-structure implementer should always be aware of how he is going to
safely reclaim memory in the presence of multi-threading. Garbage collection
does not get you a free lunch here. Neither does reference counting.

Memory overhead? No, a reference count releases the object when the
reference drops to zero which is more accurate than a traditional GC
could ever be.

That is commonly claimed but actually wrong.

Please show me a 100% accurate GC; good luck finding one...

That is irrelevant. Your argument in favor of reference counting was
completely fallacious.

Your incorrect. You need to open your eyes here:

You claimed was that reference counting is "more accurate than a
traditional
GC could ever be". Consider:

{
   Bar bar()
   f(bar);
   g()
}

Reference counting will keep "bar" alive until its reference count happens
to be zeroed when it drops out of scope even though it is not reachable
during the call to "g()". Real garbage collectors can collect "bar" during
the call to "g" because it is no longer reachable.

So GCs can clearly collect sooner than reference counters.

Now your just trolling. Why are you thinking in terms of scope here? Did you
know that C++ allows the programmer to do something like:

{
  {
    Bar bar()
    f(bar);
  }
  g()
}

See? We can go in circles for ever here. Your posing non-sense.

Scope can keep reference counts above zero and values alive
unnecessarily
when a real GC can collect them because they are unreferenced even
though
they are still in scope.

It will only collect anything when the GC decides to run!

Which may well be before the scope happens to end.

Probably not. A GC only runs once in a while. Its nowhere near as accurate
as general forms of reference counting. Does that make reference counting
better? Na. It's just another tool in our box.

Anyway, traditional GC is NOT about performance, its about helping some
programmers that do not know how, or want, to manage their memory
manually.

By the same logic: C++ is for programmers who don't know, or want, to
write
assembler manually.

You comparing creating efficient manual memory management schemes with
programming assembly language? Are you serious, or just trolling again?

Are you talking about the extra word of space per-object?

Yes.

How is that different than GC meta-data?

Reference counts consume a lot more space.

Oh boy. Here we go again. I ask you: What reference counting algorithm? You
make blanket statements that are completely false!

:^/

BTW, there are refcounting
algorithms that do not need any per-object meta-data. For instance, vZOOM
keeps this information on a per-thread basis. There is GC meta-data that
keeps track of objects. There has to be. Think about it.

Yes. For example, OCaml hides two bits inside each pointer totalling zero
overhead. In contrast, a reference counting system is likely to add a
machine word, bloating each and every value by 8 bytes unnecessarily on
modern hardware.

There are some counting algorithms that use pointer stealing. Anyway, your
make false assumptions based on blanket statements. Drill down on some
specific reference counting algorithms please.

Did you know that many garbage collectors use per-object meta-data as
well?

None that we use. Which GCs are you referring to?

Many GC languages use object headers; Java.

Just Java?

..NET. Object meta-data is required by collectors and reference counting
algorithms. Some keep the data per-object. Some keep it per-thread. Some
keep it in a separate chunk of memory. Ect, ect...

I benchmarked all the mainstream reference counters for C++ many years
ago. If you think the situation has improved, perhaps we could benchmark
C++ against some GC'd languages for collection-intensive tasks now?

C++ has no GC. Anyway, are you talking about a lock-free reader patterns?

Let's do a single-threaded benchmark first.

That's no good! Multi-threading is the way things are now. You don't get a
free lunch anymore.

What collection-intensive tasks?

Symbolic rewriting would make a good benchmark. Here is one:

http://www.lambdassociates.org/studies/study10.htm

Try implementing this using reference counting and we can see how it
compares (probably on a trickier rewrite). Replace the arbitrary-precision
ints with doubles.

I need to take a look at this, but, are you sure that this even needs memory
management? Could I just reserve a large block of memory and carve
data-structures out of it and use caching? I don't think this needs GC or
reference counting.

I know I can compete, and most likely beat, a traditional GC with
handcrafted implementations that C++ give me the freedom to work with.

I admire your faith but I would like to see some evidence to back up such
claims because they run contrary to common wisdom accumulated over the
past
few decades.

Here is a stupid contrived example:
________________________________________________________
struct object {
  object* cache_next;
  bool cached;
  [...];
};

#define OBJ_DEPTH() 100000

static object g_obj_buf[OBJ_DEPTH()] = { NULL, true };
static object* g_obj_cache = NULL;

void object_prime() {
  for(int i = 0; i < OBJ_DEPTH(); ++i) {
    g_obj_buf[i].cache_next = g_obj_cache;
    g_obj_cache = &g_obj_buf[i];
  }
}

object* object_pop() {
  object* obj = g_obj_cache;
  if (! obj) {
    if (obj = malloc(sizeof(*obj))) {
      obj->cached = false;
    }
  }
  return obj;
}

void object_push(object* obj) {
  if (obj->cached) {
    obj->cache_next = g_obj_cache;
    g_obj_cache = obj;
  } else {
    free(obj);
  }
}

void foo() {
  for (;;) {
    object* foo = object_pop();
    object_push(foo);
  }
}
________________________________________________________

This is a fast object cache that will likely perform better than doing it
the GC way where nobody wants to manage their own memory:
________________________________________________________
void foo() {
  for (;;) {
    object* foo = gc_malloc(sizeof(*foo));
  }
}
________________________________________________________

Yes manual memory management is more work and if you don't want to do that,
well, GC can be a life saver indeed.

Not incremental? Drill down on that one some more please.

You actually just described the problem: reference counting releases
values when their reference drops to zero.

Yeah. This is NOT true with a traditional GC.

Absolutely. GCs do not have to wait that long: they can collect sooner.

A garbage collector generally cannot be as accurate as a reference count.
Your contrived scope example is non-sense.

At the end of a scope, many reference
counts can be zeroed at the same time and the ensuing collections can
stall the program for an unacceptable amount of time. This can also be
extremely difficult to workaround without simply resorting to a real GC.

Stall? Are you sure you know how Proxy GC works? There is no stall. There
is no contrived introspection of thread stacks. There is no
collect-the-world. There is no mark-and-sweep. There is no thread
suspension. Ect, ect...

You've just made a series of irrelevant references here.

How are the implementation details of a GC irrelevant to a discussion on GC?

We had this exact problem on a 250kLOC C++ product line and eventually
fixed
it by rewriting everything from scratch in a more modern language. The
worst case performance is now 5x faster. We wasted a couple of months
trying to fix the problem in C++ before having the revelation that we
were just reinventing a garbage collector and doing a poor job of it at
that.

Well, from your false comments on reference counting, I would expect you
to not be able to beat a tranditional garbage collector in any way shape
or form.

Of course. I fully expect you to fail on the above benchmark but am
willing
to give you the benefit of the doubt.

Any benchmark on GC these days has to be able to use multiple processes or
threads. Ahh... Here is something we can do! Okay, we can create a
multi-threaded daemon that serves factory requests to multiple processes.
This would use multi-threading, multi-processing, and shared memory:

I want to implement the wait-free factory as a multi-threaded daemon process
that multiple producers can register the path and name of a shared library,
which concurrent consumer processes can lookup by name; dynamically link
with the resulting library and call a "common" instance function (e.g.,
define common api). I guess you could think of it as a highly concurrent
per-computer COM daemon. The factory consumer threads can use the lock-free
reader pattern, and the producer threads can use a form of mutual exclusion;
including, but not limited, to locks.

Or we can do a efficient observer pattern:

I want the wait-free observer to be a multi-threaded daemon that can allow
multiple threads to create/register delegates; message-types and allow
consumer threads to register with those delegates and receive their
messages; producer threads create messages with subsequently signal the
delegates that manages those messages to multicast to their registered
consumers.

Now those programs will definitely test a GC to the limits. BTW, do you know
of a GC that can collect across multi-process working with shared memory???

Which one do you want to do? The multi-thread/process factory or the
multi-thread observer?

;^)

You are simply restating that author's misconceptions. Read the
documentation.

I am asking you how to do it. Explain how to create effectively object
caches using "traditional" collect-the-world GC...

You use a weak hash table to cache data without keeping it alive.

You do not use a weak hash table to cache data that has no other
references
to keep it alive on the assumption that the GC will be inefficient
enough
for your cache to work.

Wrong, let me inform you that a cache keeps QUIESCENT objects for further
optimized allocations.

That would be a specific kind of cache. Caches do not inherently have
anything to do with allocation.

A cache is an aid to allocation.

Are you sure you know what a cache is? A cache keep objects that have no
references around to optimize further allocations.

Apparently what you wanted to ask me was how I would implement an
allocation
cache in a garbage collected language.

Yes.

The answer depends upon the language. If you look at OCaml, for example,
there is no need to because the run-time has already automated this.

Then pick a lanaguage that does do that.

Do you know about the Solaris slab allocator?

No.

You should take a look at it.