Re: Why is a vector find faster than a set with find()?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sun, 12 Aug 2012 11:02:12 -0700 (PDT)

Message-ID:

<bb1913dc-5c49-4233-9e5f-58fd75f08352@googlegroups.com>

On Saturday, August 11, 2012 7:35:05 PM UTC+1, Leigh Johnston wrote:

On 11/08/2012 19:16, James Kanze wrote:

On Tuesday, August 7, 2012 2:04:41 PM UTC+1, (unknown) wrote:

I'm storing a large (100K) list of integer id's in a set which
I need to frequenctly search for a given id. When I try doing
this with a set using the standard find() function it takes 3
times longer than doing the same with a vector!

Why is this? Surely given that its ordered a set could be
searched using a binary chop whereas a vector must be a linear
search. Even if the set is searched linearly why does it take
3 times as long?

By definition, std::find does a linear search. That's part of
its definition. If performance is important, and you're dealing
with sorted data, there is std::lower_bound and company.

Practically speaking, neither std::find nor std::lower_bound
have any way of knowing whether the range they're iterating over
is sorted or not. It's up to you, the user, to tell it whether
the range is sorted or not, by choosing the appropriate
function. If you lie, it's undefined behavior, because
typically, the function has no way of knowing.

Finally: using an std::vector, and keeping it sorted, is often
superior to using std::set, because of greater locality. I'm
currently working on some high performance software, and we've
banned the use of node based containers for critical data,
because the loss of locality kills performance. We've also
banned them in large parts outside of the critical area, because
memory is very, very tight, and they tend to use a lot more
memory than std::vector as well. (Note that this is *not* a
general recommendation. We only took these steps because we had
to.)

You can improve the locality of the node based containers by using a
custom allocator; I got significant speed increases doing it this way
(but how much was down to improved locality and how much was down to not
using a general purpose allocator is unclear).

That would certainly help, but you'll still not get the locality of
std::vector, because each node is larger than an entry in a vector. And
of course, there's no way of telling the allocator where the element
being allocated will end up in the set, so that if you're iterating
through sequentially, you could still end up with relatively poor
locality. (If you had some way of telling the allocator the depth, you
could really win by regrouping the top nodes. Binary search on a vector
has poor locality at the beginning, since it's jumping all over the
place.)

In the end, there is no one perfect solution. If the abstraction is
fundamental to your application, define a class for it yourself; use
std::vector or std::set or whatever is simplest for you to implement it.
But provide it with a minimal interface (e.g. forward iterators, if
that's all that's needed, even if the implementation is std::vector), so
that you can easily switch implementation later, when you know where the
performance bottlenecks are.

--
James Kanze