On Tuesday, August 7, 2012 2:04:41 PM UTC+1, (unknown) wrote:
I'm storing a large (100K) list of integer id's in a set which
I need to frequenctly search for a given id. When I try doing
this with a set using the standard find() function it takes 3
times longer than doing the same with a vector!
Why is this? Surely given that its ordered a set could be
searched using a binary chop whereas a vector must be a linear
search. Even if the set is searched linearly why does it take
3 times as long?
By definition, std::find does a linear search. That's part of
its definition. If performance is important, and you're dealing
with sorted data, there is std::lower_bound and company.
Practically speaking, neither std::find nor std::lower_bound
have any way of knowing whether the range they're iterating over
is sorted or not. It's up to you, the user, to tell it whether
the range is sorted or not, by choosing the appropriate
function. If you lie, it's undefined behavior, because
typically, the function has no way of knowing.
Finally: using an std::vector, and keeping it sorted, is often
superior to using std::set, because of greater locality. I'm
currently working on some high performance software, and we've
banned the use of node based containers for critical data,
because the loss of locality kills performance. We've also
banned them in large parts outside of the critical area, because
memory is very, very tight, and they tend to use a lot more
memory than std::vector as well. (Note that this is *not* a
general recommendation. We only took these steps because we had
to.)