Re: about new and delete

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

comp.lang.c++

Date:

Sat, 26 Sep 2009 05:04:16 +0200

Message-ID:

<h9k07j$dsj$1@news.eternal-september.org>

* Sam:

Alf P. Steinbach writes:

No.

Can you figure out what's wrong with your measurement?

Yes: nothing.

Well, OK, that may explain your apparent bafflement at the comments you're drawing.

The key to understanding this is to combine three pieces of information:

   * The C++03 standard requires a contiguous buffer for std::vector.

   * push_back has guaranteed amortized O(1) complexity.

   * Allocation is generally way more costly (time) than copying a handful
     of bytes.

Regarding the last point it seems that you have a fast allocator. This
influences the relative performance of std::list versus std::vector.

The first two points mean that that when the buffer space is exceeded a
std::vector /must/ increase the buffer size by a factor, not by a constant
amount. The most common, as far as I know, is to just double the buffer size.
This involves 1 allocation plus copying O(n) bytes, where n is the current size.

When a std::vector that uses buffer doubling is created by repeated push_back
operations you therefore have two main contributions to the time: roughly C*n
time for copying (where C is some constant), and roughly A*(1+log2(n)) time for
allocations (where A is some other constant, and log2(x) is the number such that
2^(log2(x)) = x).

In contrast, for the std::list each push_back generally involves one allocation,
so you have roughly C*n+A*n (with possibly not quite the same values for C and
A, but whatever), anyway, linear in n.

If you draw the graph of log2(x) you'll see that from x=1 on it starts going
upward at a good incline, but flattens out rapidly. For example, log2(4) = 2,
but then you have to go up to log2(8) to get 3, and next all the way up to
log2(16) to get 4. Further on, log2(128) = 7, but then you have to go 128 units
more to increase the result to 8. So on, it gets really flat very fast.

And so at the beginning around n=10, which was you chose, there is a practical
possibility that std::list can outperform a std::vector, because (if there is no
small buffer optimization in the std::vector implementation) you have five
allocations for the vector and ten for the list, and with a fast allocator the
data copying of the vector may make up for that.

But if you go up to n=15 you still have just five allocations for the vector,
but now 15 for the list! At this point std::vector will probably outperform
std::list.

At n=16 you should expect a possible apparent drop in std::vector efficiency,
because here (with a buffer-doubling vector) you incur an additional allocation.

But then it's a free ride all the way up to and including n=32, where for the
list you have 32 allocations, and for std::vector you have just 6. Given that
the assumption in the third point above holds, you should expect std::list to be
decidedly inferior efficiency-wise in this range of n. Which is still *small*.

In short, the main problem with your test was that you tested only a single n
which was not just small but very small (namely 10). Instead your test should
have checked various values of n within some reasonable small range. And better
yet, systematically checking the general behavior of std::list versus
std::vector as n increases.

Cheers & hth.,

- Alf