Re: hash_set: how to handle collisions?
 
On Jul 11, 3:20 am, Markus Dehmann <markus.dehm...@gmail.com> wrote:
Do I have to handle hash collisions in a hash_set myself?
Which hash_set?  There is not, and almost certainly never will
be, a hash_set in C++.
Which is, of course, a technicality.  There will be an
unordered_set, and most existing hash_set are only slightly
different (but in different and incompatible ways).
I did a test in which I use find() to look for objects in a
hash_set.  These objects are definitely not contained, but
find() sometimes finds them anyway. See this code:
<code>
#include <iostream>
#include <stdexcept>
#include <ctime>
#include <set>
// also include header that defines gnu_namespace and includes
//  hash_set
If you're asking specifics about the GNU implementation, you
should probably ask an implementation specific group.  But
supposing that the issue is more general, and concerns what
unordered_set will guarantee in the end.
class MyContainer {
  std::vector<int> v;
public:
  MyContainer(){}
  std::size_t hashcode() const {
    std::size_t hash = 0;
    for(unsigned i=0; i<v.size(); ++i){ // sdbm function gives
collisions sometimes
      hash = v[i] + (hash << 6) + (hash << 16) - hash;
    }
    return hash;
  }
  void add(int i){v.push_back(i);}
};
struct eqPtrs {
  bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
    return h1->hashcode() == h2->hashcode(); // problematic b/c of
collisions?
  }
};
You've defined two objects to be equal if their hash codes are
equal.  Is that really what you want?
namespace gnu_namespace {
  template<> struct hash<const MyContainer*> {
    inline size_t operator()(const MyContainer* d) const {
      return d->hashcode();
    }
  };
}
int getRand(int min, int max){
  return ((rand() % (max-min+1)) + min);
}
int main(int argc, char** argv){
  srand(time(0));
  typedef hash_set<const MyContainer*, hash<const MyContainer*>,
eqPtrs> MyMap;
  MyMap myMap;
  int repeat = 100000;
  int size = 10;
  for(int i=0; i<repeat; ++i){
    MyContainer* h = new MyContainer();
    for(int j=0; j<size; ++j){
      h->add(getRand(0, 1000));
    }
    myMap.insert(h);
  }
  for(int i=0; i<repeat; ++i){
    MyContainer* h = new MyContainer();
    for(int j=0; j<size; ++j){
      h->add(getRand(2000, 3000));
    }
    MyMap::const_iterator found = myMap.find(h);
    assert(found == myMap.end()); // aborts!
I'm not sure I understand.  You create a random object, and
assert that it is in the container.  Why should you expect it to
be in the container?
  }
  // TODO: finally delete elements in myMap
  return EXIT_SUCCESS;}
</code>
The solution seems to be to adapt the equality condition in
eqPtrs to also test for actual equality of the members, not
just equality of the hash codes:
struct eqPtrs {
  bool operator()(const MyContainer* h1, const MyContainer* h2) const
{
    return h1->hashcode() == h2->hashcode() && haveSameElements(*h1,
*h2); // added condition
  }
};
Is that the solution, or am I doing something wrong in general?
I don't know.  It's up to you to define what equality should
mean.  The only requirement is that if two elements are equal,
they have the same hash code.
--
James Kanze (GABI Software)             email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34