Re: vector::insert performance tip.

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

29 Apr 2007 11:13:23 -0700

Message-ID:

<1177870403.485991.291090@n76g2000hsh.googlegroups.com>

On Apr 28, 9:49 pm, "Victor Bazarov" <v.Abaza...@comAcast.net> wrote:

kostas wrote:

Trying to dump a set<int> (s) in a vector (v) my guess was that
v.insert(v.end(),s.begin(),s.end())
would be at least as efficient as
copy(s.begin(), s.end(), back_inserter(v))
It turn out that it is almost two times slower (even without using
reserve() before copy) and that's because it traverses the set twice.

I don't see why it would need to traverse the set twice. Any hint
to why it was happening?

It's probably an "optimization":-). If the number of elements
in the set is large, and the elements are expensive to copy,
it's faster to travers the set a first time to calculate how
many elements are needed, ensure sufficient capacity, and then
travers the set a second time to copy them.

I'm posting it as an exception to the usual (and usually correct)
advise
"Prefer member functions to algorithms".

I don't think the advice has anything to do with performance. The
usual (and always correct) advice WRT performance is "measure first
then try to optimize".

And this turns out to be a good example of why:-). Off hand, I
would have imagined that traversing the set twice would be
faster, because it avoids unnecessary allocations. Apparently,
I'd have guessed wrong. And apparently, the author of the STL
he's using guessed wrong as well.

(I suspect which form is faster will depend on any number of
external factors, but if the number of elements is large...
std::set has very poor locality, so you're likely to get some
paging, or at least cache misses, in each pass.)

--
James Kanze (Gabi Software) email: james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34