Re: Hash Map slower than Map.

From:

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?= <Erik-wikstrom@telia.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 09 Oct 2007 09:33:06 GMT

Message-ID:

<mVHOi.10749$ZA.7089@newsb.telia.net>

On 2007-10-09 00:52, Amit Bhatia wrote:

Erik Wikstr??m wrote:

On 2007-10-08 23:36, Amit Bhatia wrote:

Hi,
I am trying to write a hash function that works on a pair of integers.

so I have a pair<int,int> of integers. Let the first element be called
"a" and second as "b".

Do you want to create a map from a to b, or from (a,b) to c, for some
other c. Or put another way, is a the key or is it a and b?

My key is the pair (a,b) which maps to some other object c.

I know before hand that:

0<=a<30
-2^a<=b<=2^a-1.

If all the b's were >0, then I could already think of a very simple
perfect hash function:

size_t value = 1<<a+b;

For all values of a > 4 that could result in 0 since 2^5 = 32.

From what I understand size_t is unsigned long (?), so it should have
atleast 32 bits available?

size_t is an unsigned integer type "large enough", the actual size is
platform dependent, on most 32-bit platforms it is probably 32 bits.
What I meant was that when you do 1 << (a + b), if a is 5 (or larger)
than b is 32 or larger, which means that you shift 1 left 32 times (or
more) which means that you have shifted the 1 out of the results. I was
wrong however, it will not result in 0,but undefined behaviour.

However, if b<0 is also possible, then, I can only think of

size_t value;
if(b>=0) value = 1<<a+b;
else value = 2^30 + 1<<a+-(b+1); // I know a<30 for sure.

I do not know what makes a good hash-value, but a fast operation that
will create something that is probably quite unique for each pair is to
simply xor them together. One op with no conditionals.

O.k. I am not clear though how it will ensure uniques mapping of the
pair most of the time. I will think more about it.

The value of a will probably not be unique very often since it is so
limited in range, however the value of b have a larger range and a much
greater chance of being unique, and it will certainly be unique for each
value of a.

Now I used this in my code, and did some profiling and I found that I am
spending almost 80 % of the time just looking up if a key exists in this
hash map. (there are no other major calculations in the program function
where this "find" operation is called).

So for example, in one particular execution of the program, the function
which calls this key lookup is called

158905316 times and takes 100 sec (~80%) of the time.

My question is: can I do much faster than this? I am not an expert, and
I am using g++ 4.2 on a pentium 4 2.4 Ghz with 512 MB RAM. But still
this should be much faster.

What is really surprising is that if I use maps instead, they are
running faster.

So for about, 83127320 calls, I take 16.06 sec (~47.26%) if I use a stl map.

If you create a map from a to b then std::map will probably always be
faster than, or at least competitive with, a hash map since the number
of mapped elements is so small. To make it even faster use a fixed size
array (std::vector).

Vectors are not a good idea, since I do not know apriori how many
objects I will end up creating to be put into the hash_map. It can be
true that the number of elements in hash_map is small. So for 5
dimensional grid, I end up with something like ~15000 elements inside
the hash_map in the end. But still potentially in the worst case (which
is very unlikely though), I could have to put every single region from
the grid into the hash_map, which means for this particular case at
resolution level l : 2^{5l} objects.

No, a vector would only work if you mapped from a to b.

--
Erik Wikstr??m