Re: Silver bullet for C++ memory management ?

From:

Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>

Newsgroups:

comp.lang.c++.moderated

Date:

Sat, 28 Mar 2009 20:15:25 CST

Message-ID:

<KH80pC.1vrp@beaver.cs.washington.edu>

{ Note: below Andrei is not referring to Stefan Ram but to the article "Java
theory and practice: Urban performance legends, revisited" by Brian Goetz at
<url: http://preview.tinyurl.com/cd8d7p>, earlier quoted by Stefan. -mod }

joshuamaurice@gmail.com wrote:

On Mar 25, 6:49 pm, r...@zedat.fu-berlin.de (Stefan Ram) wrote:

       ?[A]llocation in modern JVMs is far faster than the best
       performing malloc implementations. The common code path
       for new Object() in HotSpot 1.4.2 and later is
       approximately 10 machine instructions (data provided by
       Sun; see Resources), whereas the best performing malloc
       implementations in C require on average between 60 and 100
       instructions per call (Detlefs, et. al.; see Resources).
       And allocation performance is not a trivial component of
       overall performance -- benchmarks show that many
       real-world C and C++ programs, such as Perl and
       Ghostscript, spend 20 to 30 percent of their total
       execution time in malloc and free -- far more than the
       allocation and garbage collection overhead of a healthy
       Java application (Zorn; see Resources).?

http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html?ca...

As I see it, it must be one of:
1- the above studies must have ignored crucial details in arriving in
such blatantly wrong conclusions,
2- current malloc implementations really really suck,
or 3- There is something I'm unaware of which greatly pessimizes
malloc performance. (I don't think so.)

The article is not very competent and rather axe-grinding. What happens
is, you can always achieve faster allocation by consuming more memory -
that's old news even with malloc. Many custom allocators allocate faster
than e.g. Doug Lea's allocator (which is about the best of the breed and
the base for the Linux allocator) but consume more memory, which in turn
may cause less cache-friendliness so ultimately poorer performance. So
counting the number of instructions it takes to allocate is not very
relevant. What would be, and is conspicuously missing, is real numbers
on real programs. Here the disingenuity of the article comes to the
fore. A conversation with the author would go like so:

Author: "And allocation performance is not a trivial component of
overall performance -- benchmarks show that many real-world C and C++
programs, such as Perl and Ghostscript, spend 20 to 30 percent of their
total execution time in malloc and free -- far more than the allocation
and garbage collection overhead of a healthy Java application."

Reader: "Wow. That means Perl and Ghostscript would be prime candidates
for speed gains from Java! Since you mentioned these, I'm sure there are
numbers in the article that show the slam-dunk gains."

Author: "Nope. I just mentioned the numbers because they looked good for
my fabricated argument."

Simply put, malloc() must search some sort of data structure to find
an unused memory piece, just like new Object() in Java. In C and C++,
the call to free() occurs in the user thread in the middle of
application processing. It should be a simple update to that data
structure saying this piece of memory is no longer used. Java, even
with the best generational garbage collectors, must necessarily do
more computation to determine which pieces of memory are no longer in
use than free().

For equivalently optimized malloc implementions and JVM GC
implementations, malloc must be equal or faster.

I think that's simplifying a bit too much.

In a lot of such studies I see, they do stuff like compare Java user
time vs C++ application time and claim victory. This nicely hides the
GC activity away in "JVM system time" not "user time". This is
dishonesty and worthy of being called fraud. "Oh, but it's in a
separate thread, so the application thread is faster." Yeah, and the
equivalent C++ program written to use that other core with threads
would then outperform the Java application running on the multiple
cores. Also, it's relatively rare that your application is going to be
running on a multi-core system, be single threaded, and not have to
compete with other things for those cores (like other processes
running on a personal computer). Intellectual dishonesty at its finest
here.

I don't think that's fraud at all. Not all processing can be
parallelized, so there's no automatic gains from more processors in all
cases. For such algorithms, if one processor is dedicated to concurrent
GC and the overall execution is faster, there is a net gain.

Scaling up to manycores, it seems rather advantageous to dedicate one or
more processors to concurrent GC.

Andrei

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]