Re: Is there in memory tree structure faster than std::map?
Scott Lurndal <scott@slp53.sl.home> wrote:
I can make an allocator that does allocations in O(1) and is a million
times slower than the standard allocator. "O(1)" tells absolutely nothing.
Having written many, highly efficient, pool allocators for 64 to 1024-core
systems over the last 20 years, I'm not sure what you're attempting to
get at here. A simple fixed-sized pool allocator delinks the top element
and returns it. The deallocator relinks the element at either the front
or back of the list (each have benefits). In both cases, this is a pair
of memory accesses. Highly efficient.
If the cause for slowness is thread-safety and memory fragmentation, it
doesn't matter how you manage the allocated blocks. Your "O(1)" tells
absolutely nothing.
Even running a compaction sweep from time to time can make the allocator
significantly faster overall (because the sweep removes the randomness of
the free blocks), even though the sweep itself is linear-time. Yes, I have
experimented with this.
What makes you think that locking has no impact in performance? The whole
industry is researching lock-free algorithms like mad precisely because
locking is a really heavy operation. Why would they otherwise?
The industry is researching many facets of parallel programming, and as processor
counts become larger, formerly efficient algorithms are not scaling, so
those algorithms are being tweaked or supplanted (RCU, for example).
What does that have to do with what I said above?
First, who cares where the memory allocator is going to "jump next". The
interesting issue is when the allocated memory is used, not when it is
allocated. Note that a pool allocator need never touch the memory that
is being allocated (c.f. bitmap allocator), so the only cache issue occurs
when the allocated memory is being touched.
I really don't unserstand what you are saying here. Are you basing your
speed measurements on simply updating the allocation lists but never
touching the allocated memory? No wonder you are not seeing any efficiency
problems with memory fragmentation.
An actual program that uses the allocator is *always* going to immediately
write to the allocated memory block (especially in C++). If that memory
block is not in the cache, it's going to be slow. The CPU cannot magically
know where the next write will happen to and prepare in advance for it.
When using the standard allocator, if you use much more memory that fits in
the caches, and you need to do lots of allocations and deallocations at
random, one of the major sources of inefficiency is precisely the random
distribution of free blocks that ensues. (Obviously if you only use as
much memory as fill fit in the caches, then it will be much more
efficient.) When the list of free blocks is random, there's a high chance
that the next allocated block will not be cached.
You are telling me that the list of free blocks being randomized is actually
a good thing.