Tom Anderson wrote:
long bits = Double.doubleToLongBits( key );
int hash = (int)(bits ^ (bits >>> 32));
provides terrible performance.
Interesting. I chose that function because it's what java.lang.Double
does, rather than because i thought it'd be good, but i am surprised
to hear it's terrible - doubles are quite complicated internally, so
would have thought that a parade of natural numbers would give
reasonably well-distributed hashes this way (whereas longs wouldn't,
of course). How did you conclude it's terrible?
Writing my own hash table implementation, I noticed that I was getting
terrible performance with a ton of collisions and everything was heaped
up in a tiny spot in the table.
Inspecting the hash in hexadecimal, I realized that Jon's data keys --
the natural counting numbers 1, 2, 3, etc. -- are represented in a
double as a few bits in the upper most bits of the double. The lower
bits are always 0, even after slicing the 64 bit double's bit pattern in
half and xoring the two halves.
This xoring results in regular hash bit patterns like:
0x20200000
0x40200000
0x40600000
0x60200000
etc. as the numbers count up
(bit patterns made up from memory, but you get the idea.)
i.e., hashes with very few bits different, and all in the upper most
bits of the hash. This is exactly the opposite of what you want in a
good hash, which is lots of randomness in the lower bits of the hash code.
So I concluded: absent any other perturbation in the hash, it sucks.
integer remainder *very* slow compared to masking. I found that I got
overall better lookup performance because of the remainder cost.
bucket count would have won.