Re: hash two keys to one index

From:

"Mark" <mnbayazit@gmail.com>

Newsgroups:

comp.lang.java.programmer

Date:

21 Nov 2006 13:57:44 -0800

Message-ID:

<1164146264.041206.57990@b28g2000cwb.googlegroups.com>

Mark wrote:

Eric Sosman wrote:

Mark wrote:

Patricia Shanahan wrote:

Mark wrote:

Eric Sosman wrote:

Mark wrote:

Hi,

I need to figure out how to map two different keys (a string and an
integer) to the same index. I don't even know where to start... but I
don't want two copies of the the hash table.

Any ideas?

My only "idea" is that you should take a few steps back
from the immediate tactical issue and explain your overall
strategic goal. *Why* do you want "ABC" and new Integer(42)
to produce the same hashCode(), and what good do you think
it will do you? On first sight, at least, it seems daft.
Pray explain the well-concealed sanity that lies beneath.

Well, if I want to pull up all the data pertaining to a student, but I
only know his last name or his student number, not both...

That is a very good reason for having a HashMap with names as keys, and
a HashMap with student numbers as keys...

Note that every non-null non-primitive variable or expression in Java is
a pointer to some object, not the value of an object. That means that
the two HashMap instances can point to the same Student instances.

For the specific case of name and number, you could get away with a
single table. It might be slightly slower than two tables, but probably
only slightly. However, mixed key tables cannot take advantage of
generics, and you must be very sure that you never need e.g. to index by
two different numbers, where there could be a risk of hitting on the
wrong key.

Generally, two tables is much cleaner.

Well, two tables is okay I guess as long as there aren't actually two
instances of the data.
So basically... I have two arrays of references...

Almost: you have two Maps that associate keys with references.
But you seem to have grasped the main point, which is that the
references point to the objects but are not the object instances.
What goes into the map are pairs of (reference to key, reference
to target), not the keys and the targets themselves.

Faulty analogy: You have your best friends' phone numbers in
your Palm Pilot, and maybe written down in a few places, and for
that matter you could always look 'em up in the phone book. But
your friends' actual, physical phones are in none of these places;
the things you store and retrieve are "references" to those phones.
Through the magic of the phone network you can use a phone number
to make a friend's phone ring, just as through the magic of the JVM
you can use a reference to make an object instance do something.
But the friend's phone, like the object instance, is not "in your
possession;" all you're holding is the reference.

--
Eric Sosman
esosman@acm-dot-org.invalid

I've run into a bit of a snag.

When I insert an object into the hash table, I pass in (a reference to)
the object, and a hash code, which can either be based on the last name
or the student number. To deal with collisions, I use a quadratic
probing technique...

    void insert(Object obj, int hash) throws HashTableFull
    {
        if( count == table.length ) throw new HashTableFull("Cannot insert:
hash table is full");
        int key = probe(hash);
        table[key] = obj;
        ++count;
    }

    int sign(int i)
    {
        if( i < 0 ) return -1;
        return 1;
    }

    int probe(int hash)
    {
        int probe = 0;

        while(true)
        {
            int key = (hash + probe*probe*sign(probe)) % table.length;

            if( table[key] == null )
                return key;

            if( probe <= 0 )
                probe = -probe+1;
            else
                probe = -probe;
        }
    }

Which, in theory, should work nicely. However... when retrieving the
object, how do I know when I've found the right one?

    Object find(int hash) throws ObjectNotFound
    {
        int probe = 0;

        for(int i=0; i<table.length; i++)
        {
            int key = (hash + probe*probe*sign(probe)) % table.length;

            if( /* table[key] == ??? */ )
                return table[key];

            if( probe <= 0 )
                probe = -probe+1;
            else
                probe = -probe;
        }

        throw new ObjectNotFound("Object not found");
    }

Rereading what you wrote, you indicate that a reference to the key
should also be stored. I didn't really understand why a reference to
the key should be stored, if the key could easily be derived from
hashing the object. However... I suppose it could be useful when
trying to find an object...

For example, if one object has a hash "5" and another object has hash
"15", and the hash table capacity is 10... then 5%10 and 15%10 both
equal 5..so they have the same key, but different hashes.. then we can
check if we've found the right object like so...

    Object find(int hash) throws ObjectNotFound
    {
        int probe = 0;

        for(int i=0; i<m_table.length; i++)
        {
            int key = (hash + probe*probe*sign(probe)) % m_table.length;

            if( m_hash[key] == hash )
                return m_table[key];

            if( probe <= 0 )
                probe = -probe+1;
            else
                probe = -probe;
        }

        throw new ObjectNotFound("Object not found");
    }

using the modified insert...

    void insert(Object obj, int hash) throws HashTableFull
    {
        if( count == m_table.length ) throw new HashTableFull("Cannot insert:
hash m_table is full");
        int key = probe(hash);
        m_table[key] = obj;
        m_hash[key] = hash;
        ++count;
    }

am I correct?