Re: hashCode() for Custom classes

From:

Lew <lew@lewscanon.com>

Newsgroups:

comp.lang.java.programmer

Date:

Sat, 19 Apr 2008 09:25:34 -0400

Message-ID:

<hvidnerkw8xTb5TVnZ2dnUVZ_gmdnZ2d@comcast.com>

Roedy Green wrote:

On Sat, 19 Apr 2008 00:02:29 -0400, Lew <lew@lewscanon.com> wrote,
quoted or indirectly quoted someone who said :

Another point is that hashes are explicitly ints, not bitmaps. It's more
natural and self-documenting to use integer operations on them.

On the other hand, hashCodes are just patterns, not quantities, so you
could also argue it makes sense to use the integer bit manipulator
operators on them.

I was wondering if you get any better spread with + or ^ under some
circumstances.

Assume your numbers were all multiples of 4. With +, the hashcode
would always end in 2 zeros. With xor it would end in either 2 zeros
or 2 ones, but in neither case would you get those lower 2 bits nicely
scrambled.

Bits nicely scrambled is not a requirement of hash codes, only a means to the
real requirement. If my numbers were all multiples of 4, that means some
other input probably is not all multiples of 4, so the likelihood of collision
for non-equal values would remain small. That's what matters.

Even a cryptographic hash function can have an output with the lower two bits
set to zero for some input. Even a completely random coin toss sequence can
result in one thousand consecutive head tosses. Don't obsess over the
appearance of a certain output for a certain input; seeking to suppress such
outputs could actually reduce "randomness". Look at the range of possible
outputs given an expected subset of domain values, and ensure that sets of
likely inputs are not given to hash collisions for non-equal values. Who
cares if the hash values don't seem random, as long as that property holds?
For that matter, who cares if there are a gazillion values with the same hash,
as long as no two are likely to appear in a single input set?

--
Lew