Re: Container performance
On 4/3/07 6:22 PM, in article MPG.207d122da54ca314989862@news.cc.tut.fi,
"Kaba" <REkalleMOunderscoreVErutanenME@hotmail.com> wrote:
I am a bit worried about the performance of the current containers.
While the implementations and algorithms are no doubt of the highest
quality in todays STL libraries, the current memory allocation scheme
seems to slow down the node-based containers dramatically.
To back up this claim, here are the results of a simple test:
(Visual Studio 2005 SP1, with checked iterators turned off, release
mode).
<test_output>
Inserting 1000000 (million) unique elements.
std::vector
0.047 s for insertion.
0 s for destruction.
std::list
1.406 s for insertion.
13.734 s for destruction.
std::set
1.984 s for insertion.
175.969 s for destruction.
Pastel::FastList
0.078 s for insertion.
0.172 s for destruction.
Pastel::unordered_set using std::list
39.219 s for insertion.
24.422 s for destruction.
Pastel::unordered_set using Pastel::FastList
0.532 s for insertion.
0.172 s for destruction.
<\test_output>
Here are my comparable gcc compiler timings for inserting and removing one
million items (one at a time) from a container:
add remove
vector: 3 0
list: 16 18
set: 58 (28 w/hint) 68
unordered set: 22 31`
The function timed in each case followed this pattern:
void list_test()
{
clock_t start = clock();
for (int i = 0; i < kMaxNumElements; i++)
gList.insert(gList.end(), i);
clock_t end = clock();
std::cout << "list add: " << end - start << "\n";
clock_t start2 = clock();
for (int i = 0; i < kMaxNumElements; i++)
gList.pop_back();
clock_t end2 = clock();
std::cout << "list remove: " << end2 - start2 << "\n";
}
The containers in Pastel:: namespace are our own implementations. The
Pastel::unordered_set is a one-to-one implementation of the proposal for
the upcoming hash-table. When we had implemented unordered_set, we
noticed that it was unacceptly slow. The rehashes took a _long_ time.
This was quickly traced to the internally used std::list, and in there
particularly to allocation and deallocation.
One of the motivations for choosing a std::list in this case was presumably
because std::lists require relatively few memory allocations and
de-allocations. Because a std::list iterator usually refers to the same
object for the entire life of the iterator - the only time memory needs to
be allocated or freed is upon insertion or removal of an item from a
std::list. Otherwise, all other std::list operations - including sorting,
splicing, or swapping elements - should not change which item each iterator
in the list references. So I do not see why rehashing an unordered_set
(implemented with the help of a std::list) would require frequent memory
allocations and de-allocations to such an extent that performance be so
adversely affected.
Now that the splice was removed, we could implement a pool allocator
similar to the one in Loki library or in book "Modern C++ design". This
allocator is not compatible with the stl allocators. The allocator used
the knowledge that it was used to allocate and deallocate blocks of
certain size. The increase in performance is dramatic in FastList: of
the order 20x for insertion and 130x for destruction. This performance
gain was even more boosted in unordered_set: 73x for insertion, 140x for
destruction.
It's also unclear to me why the splice operations would have any negative
performance implications. As far as I can tell, implementing std::list
splice() should require no memory allocations or de-allocations at all.
After all, splicing must neither invalidate nor change the referenced item
of any iterator involved in the operation.
Now to the point. The upcoming unordered_set is not going to be a very
practical container if it uses the same naive allocation and
deallocation techniques as before. What could we do about this?
There are already working implementations of std::tr1::unordered_set, so it
is not necessary to speculate about its performance. And the most
interesting measure of performance for a container is not listed above.
After all, no program places items in a container for the sole purpose of
emptying the container later on. Instead, a program places an item in a
container with the expectation of having to retrieve that item later on. So
the less amount of time it takes a program to retrieve a particular item
from a container, the better the performance of the program overall.
Since only two of the containers that appear in the above list support
key-based retrieval (std::set and std::tr1::unordered set), it would make
the most sense to compare how long it takes to retrieve one million items
from each:
set retrieval: 25
unordered_set retrieval: 2
In all three container operations measured (add, remove, retrieve items),
gcc's unordered_set handily beats its own std::set - so it would not seem
that the current allocator requirements have not impaired gcc's
implementation of unordered_set - or at least not to any appreciable degree.
Greg
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]