Re: Additional logging questions

From:

Lew <noone@lewscanon.com>

Newsgroups:

comp.lang.java.programmer

Date:

Mon, 27 Feb 2012 23:51:37 -0800

Message-ID:

<jii126$df5$1@news.albasani.net>

On 02/27/2012 05:57 PM, Arne Vajh??j wrote:

On 2/27/2012 8:48 PM, Novice wrote:

Lew<noone@lewscanon.com> wrote in news:jigr8v$jhh$1@news.albasani.net:

But wait! 'hashCode()', which is a shortcut for 'equals()' (well,
really for "not equals()"), doesn't know anything about those fields
yet! It will think objects are not equal when they really are.

So you have to override 'hashCode()', too, using the same fields, so
that its result are consistent with 'equals()':

@Override public int hashCode()
{
return owner.hashCode() * 31 + color.hashCode();
}

Just curious: why multiply the owner.hashCode() by 31? There's no
significance to the 31 is there? Why not just add the two hashCodes
together without multiplying one of them first? That would be just as
good, right?

Actually not.

Some numbers are better than others (no multiply means 1).

You should go for a prime.

31 is often used.

But there are other good numbers.

Donald Knuth, _The Art of Computer Programming_, suggested 31 for modulo 32
hashes because it produces good distribution on the average for random inputs.

There are better hashes, but for most purposes the one illustrated suffices.

More generally (pseudocode):

@Override int hashCode()
{
  int hash = 17; // or 0 or some other decent starter
  for (Object field : instanceFields)
  {
    hash = hash * 31;
    hash += field.hashCode();
  }
  return hash;
}

I skipped over niceties like whether to use 0 for null or empty element hashes.

The Javadocs explain some of this, but the purpose of 'hashCode()' mostly is
to quickly tell if instances are not equal. Equal instances must have the same
hash - that's a hard rule. But the converse cannot apply - many instances will
have the same hash even if they're not equal.

The trick is not to have many all at once during the run of your application.

Then probability is your friend. With 2^^32 possible hashcodes, let's say for
a map of 'String' to 'Foo', you have a lot of room for many strings before you
get too many collisions. That means you need a hash that is about equally
likely to pick any number as any other, so that for any given set of actual
values at a time you probably will have different hashes for different objects.

In practice you get a few collisions, but as long as "a few" is within
acceptable limits you're good to go. Otherwise you have to play advanced
reindeer games with hash algorithms.

Why all this hash fooferol? Because once you have a hash value lookups are so
much faster on 'int' comparisons than between arbitrary 'String's of varying
length. So let's say you're trying to find an element in a 'Set' - you look
for the hashcode first, and if you find zero or one element you're done.
Otherwise you distinguish between the colliding elements with 'equals()', but
hopefully only for two or at worst three elements.

Hashes are also useful in cryptography.

It's an endless topic, full of fun. You need never be bored of a winter's eve.
I'd actually start with Wikipedia on this one.

Aside: I seem to recall an 'equals()' method builder in Apache Commons Lang
somewhere. It's got some sort of indicative name like 'EqualsBuilder'.

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg