Re: Is there in memory tree structure faster than std::map?

From:

Juha Nieminen <nospam@thanks.invalid>

Newsgroups:

comp.lang.c++

Date:

Mon, 13 Aug 2012 16:25:59 +0000 (UTC)

Message-ID:

<k0b9qn$v54$1@speranza.aioe.org>

Scott Lurndal <scott@slp53.sl.home> wrote:

Juha Nieminen <nospam@thanks.invalid> writes:

Leigh Johnston <leigh@i42.co.uk> wrote:

Three words: custom pool allocator.

It can be difficult to match the compiler's (well, libc's) own allocator,
unless you are willing to make compromises eg. in terms of thread safety.

A custom pool allocator (defined: an allocator that returns fixed sized
elements from a pre-allocated array of elements) has a fixed cost O(1) for
both allocation and deallocation. One cannot get more efficient.

I can make an allocator that does allocations in O(1) and is a million
times slower than the standard allocator. "O(1)" tells absolutely nothing.

Adding thread safety to a pool allocator is trivial and has no impact on
performance (in the non-contended case).

Adding thread safety is not trivial and it will have a big impact on its
performance (unless you manage to create a lock-free, wait-free allocator,
which is *far* from being trivial).

What makes you think that locking has no impact in performance? The whole
industry is researching lock-free algorithms like mad precisely because
locking is a really heavy operation. Why would they otherwise?

From my experiments I think that the major reason (or one of the major
reasons) for the slowness of the standard allocator is that it's
thread-safe.

This seems unlikely.

"Seems unlikely." Well argumented. Care to back that gut feeling up with
something more concrete?

I have done some actual measurements.

This is a cache-killer.

Can you elaborate on this? How, exactly does this "kill" a cache?

It's just an expression. It means that operating on a heavily fragmented
memory causes lots of cache misses, nullifying the benefits of the cache,
killing performance.

A modern set-associative cache is designed to operate efficiently with
random data placement. Modern processor prefetching logic handles strided
accesses quite well.

The processor cannot guess in advance where the memory allocator is going
to jump next. If it jumps to an uncached part of RAM, then it will be
slow.

It seems to me that what you are saying is that the standard allocator is
not slow because of it being thread-safe and cache-unfriendly. If not, then
please explain to me your hypothesis of why it's slow.

If you want to fix this, you'll have
to find some way of defragmenting and compacting the memory once in a
while, which can be really difficult due to how a typical C/C++ program
works.

C'est what?

If you don't understand what memory compaction is, and why it's so hard to
do in a C++ program, then perhaps you have no business in discussing why
memory allocators are so slow and how they can be improved.