Re: STL Map uses hashing?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 3 Jun 2008 00:44:42 -0700 (PDT)

Message-ID:

<17728aed-0b6c-4d35-979e-eb2640df7f8f@79g2000hsk.googlegroups.com>

On Jun 2, 3:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

In article
<f323e752-ab41-4697-9e74-635e235be...@l42g2000hsc.googlegroups.com>,
James Kanze <james.ka...@gmail.com> wrote:

On May 30, 4:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are
typically faster than balanced tree on large amount of random
data, they do not guarantee a worse case scenario of O log(N).

I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Ok, sorry about my lack of definition for "typically":

Assuming a non-perfect hash function that return you a bucket
identifier (almost necessary since a perfect hash function
would be too expensive for it to be worthwhile), this hash
function makes assumptions on the data it receives (e.g.
random). The hash table making use of this hash function will
be very fast when presented with "typical" data (i.e. data
that generally meet the assumptions used for creating the hash
function). Unfortunately, there may be some worse case
scenario under which the performance of the hash table will be
abyssimal.

That's not my point. The performance of a hash table depends
very much on the quality of the hash function. While a perfect
hash function is only possible if the exact set of data are
known in advance, hash functions do differ enormously in their
quality, and I've seen more than a few cases where even some
very competent programmers have come up with relatively poor
hash functions; the most obvious is in Java's hash map, where
Java was forced to change the hash function as a result. I
suspect that part of the reason why the original STL had a
balanced tree, but not a hash table, is that one can sort of
expect an average programmer to get the ordering relationship
right, but you can't expect him to define a good hash function.
(There's also the point that if he gets the ordering function
wrong, there's a good chance of the program not working at all,
so he'll notice. Where as in the case of a hash function, the
program will only be a little bit slower than it should be.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34