Re: code review / efficient lookup techniques...

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Mon, 23 Nov 2009 01:54:38 -0800 (PST)

Message-ID:

<6ede0bae-3396-4794-9e03-07af132087ef@p32g2000vbi.googlegroups.com>

On Nov 23, 12:01 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

"James Kanze" <james.ka...@gmail.com> wrote in message

news:c5e99f7b-909a-4512-afa9-0fd956dd361c@t2g2000yqn.googlegroups.com...
On Nov 13, 5:54 am, "James" <n...@spam.invalid> wrote:
[...]

I am only using the hash table to take pressure of a
single tree. Is this a boneheaded line of thinking James?

Sort of. Hash tables have a certain complexity. Done
correctly, the allow constant time access. You've paid for
the complexity of the hash table; there's no point in adding
in the complexity of a tree as well. One or the other.
(Generally, I've favored hash tables in the past. Compared
to std::map, they have two disadvantages, however. The
first is that you need a good hash function, or they don't
work well, and a na=EFve user won't necessarily know how to
provide such a function. The second is that std::map is
already written, and is part of the standard, so every other
C++ programmer who reads your code should understand its
use; the same isn't true of your hand written hash table.)

IMHO, one reason why it might be a good idea to use tree's as
hash buckets is that provides "natural" parallelism in the
context of multi-threading. Also, you don't need to worry
about expanding/contracting the size of the hash table.
Multiple threads can search for, and add/remove, items that
hash into different buckets in parallel.

True parallelization (multiple threads on multiple cores) does
introduce additional considerations. But I'm still not
convinced: if I understand you correctly, you'd put a lock (or
other synchronization) on the bucket, rather than on the
complete table. But how does this affect how you manage the
bucket. What I'm saying is that maintaining the bucket as a
classical array, with linear search, should be faster (and will
certainly require less memory) than maintaining it as a balanced
tree (with O(lg n) lookup inside the bucket) because there
should never be more than a couple of entries in each bucket.
I don't see where the algorithm used in managing the buckets
affects whether you need a lock at the table level, or just at
the bucket level---as far as I can tell, the only time you'd
need a lock at the table level is when increasing the number of
buckets.

--
James Kanze