On Mar 17, 2:58 am, "Matteo" <mah...@ncsa.uiuc.edu> wrote:
On Mar 16, 11:29 am, "coosa" <coos...@gmail.com> wrote:
Agree with you; a map or precisely a hashed map.
Trust me on not using vectors; I've worked on the netflix competition
with a dataset containing over 100 millions of ratings and I used Java
for it, but it won't play a big role if I used C++; for some
processing the code ran and lasted around 2-3 days to finish; when I
decided to switch to HashMap it took less than 6 hours to process.
Were you using sorted vectors with a binary search? Or were you just
scanning through the vector? True, a hash map would still be faster
(in the average case) than a sorted vector, but I would like to know
how you used vectors in this case before I take your advice.
Vectors vs Maps, I also have the book of scott meyers and it's a great
one and I don't see a contradiction for using Maps or hashed maps; you
worry much about retrieving the elements fast and I guess Maps are
best suited for that.
Let me repeat again:
You need to ensure sorted elements -> sets
you need uniqueness -> sets and maps
you need fast lookup -> maps and hashes
you need insertions and deletions from the beginning, ending or middle
of container -> lists
you need random access -> vectors
See what's most crucial for your application and decide for a
container.
Good luck
I think there is a little bit of misinformation here. Both std::map
and std::set are sorted, and they both have fairly fast (i.e. O(log
N)) lookup. Hash maps have even faster lookup (in the average case,
O(1)).
The reason that the OP might prefer a map over a set his case is that
he wants to modify the data in the records. While this can be done
with std::set (by storing a pointer to the record, and providing a
comparison operator), it isn't a natural fit for the problem.
However, as the data is already sorted, and if you will not be
inserting or deleting data, then using a std::vector is a fine idea.
To search a sorted vector, use std::lower_bound, which should do a
binary search (avoid std::binary_search, as it only returns a boolean
indicating containment).
Exactly my point Matt. The data I have is already sorted.
In short my operations involve
1) Put the sorted data into a container ( I dont have to sort the
data, its already in sorted form)
2) Provide a look up to the data to the user based on unique identity
3) May have to modify the data (the pointer is stored in the vector).
Modifying the data will not result in a resort as the identity will
retain the same, only the data pointed will be different)
4)As matt pointed out, i'm using a std::lower_bound for lookup with a
predicate logic that compares strings for equality.
Something like
lower_bound(vector_data.begin(),vector_data.end(),identity,DataCompare())
Where vector_data refers to vector containing a pair of
(identity,,pointer) and DataCompare is the predicate logic defining
equality of identity which is in this case a string
If you need a reference:http://www.sgi.com/tech/stl/lower_bound.html
-matt- Hide quoted text -
- Show quoted text -- Hide quoted text -
- Show quoted text -
A quick question on predicate logic used in lower_bound. Does it have
effectively. There always seems to be range check algorithms for STL