Re: hash_set: how to handle collisions?
On Jul 11, 3:20 am, Markus Dehmann <markus.dehm...@gmail.com> wrote:
Do I have to handle hash collisions in a hash_set myself?
Which hash_set? There is not, and almost certainly never will
be, a hash_set in C++.
Which is, of course, a technicality. There will be an
unordered_set, and most existing hash_set are only slightly
different (but in different and incompatible ways).
I did a test in which I use find() to look for objects in a
hash_set. These objects are definitely not contained, but
find() sometimes finds them anyway. See this code:
<code>
#include <iostream>
#include <stdexcept>
#include <ctime>
#include <set>
// also include header that defines gnu_namespace and includes
// hash_set
If you're asking specifics about the GNU implementation, you
should probably ask an implementation specific group. But
supposing that the issue is more general, and concerns what
unordered_set will guarantee in the end.
class MyContainer {
std::vector<int> v;
public:
MyContainer(){}
std::size_t hashcode() const {
std::size_t hash = 0;
for(unsigned i=0; i<v.size(); ++i){ // sdbm function gives
collisions sometimes
hash = v[i] + (hash << 6) + (hash << 16) - hash;
}
return hash;
}
void add(int i){v.push_back(i);}
};
struct eqPtrs {
bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
return h1->hashcode() == h2->hashcode(); // problematic b/c of
collisions?
}
};
You've defined two objects to be equal if their hash codes are
equal. Is that really what you want?
namespace gnu_namespace {
template<> struct hash<const MyContainer*> {
inline size_t operator()(const MyContainer* d) const {
return d->hashcode();
}
};
}
int getRand(int min, int max){
return ((rand() % (max-min+1)) + min);
}
int main(int argc, char** argv){
srand(time(0));
typedef hash_set<const MyContainer*, hash<const MyContainer*>,
eqPtrs> MyMap;
MyMap myMap;
int repeat = 100000;
int size = 10;
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(0, 1000));
}
myMap.insert(h);
}
for(int i=0; i<repeat; ++i){
MyContainer* h = new MyContainer();
for(int j=0; j<size; ++j){
h->add(getRand(2000, 3000));
}
MyMap::const_iterator found = myMap.find(h);
assert(found == myMap.end()); // aborts!
I'm not sure I understand. You create a random object, and
assert that it is in the container. Why should you expect it to
be in the container?
}
// TODO: finally delete elements in myMap
return EXIT_SUCCESS;}
</code>
The solution seems to be to adapt the equality condition in
eqPtrs to also test for actual equality of the members, not
just equality of the hash codes:
struct eqPtrs {
bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
return h1->hashcode() == h2->hashcode() && haveSameElements(*h1,
*h2); // added condition
}
};
Is that the solution, or am I doing something wrong in general?
I don't know. It's up to you to define what equality should
mean. The only requirement is that if two elements are equal,
they have the same hash code.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34