Lasse Reichstein Nielsen wrote:
Even if they are computed every time, you don't need to know the
hashCode
of elements in the HashSet to find them. [...]
To look up an element, you use the hash code of the key/element to
search for, find a bucket based on that, and traverse the linked
list
to find one that .equals() what you search for.
[...]
You don't *need* the hash code after the bucket is
located, but it's useful to store the hashes anyhow. As
you traverse the list, you can compare the key's hash to
the entry's hash and call .equals() only if they happen
to match. Comparing two already-computed int values is
likely to be faster than calling .equals(), so this can
save some time. HashMap uses this optimization, storing
each item's hash code in its HashMap.Entry.
<topicality level="marginal">
I once did some experiments to determine whether this
optimization was worth while in an "open addressed" hash
table (not the chained style used by HashMap). Storing a
hash value with every item pointer made the table twice as
big as one that contained just the unadorned pointers -- or
to look at it another way, it allowed for only half as many
table slots for a given amount of memory. I wondered whether
avoiding the .equals() calls was enough of a benefit to repay
doubling the table's "load factor" (entry count divided by
slot count); higher load factors lead to longer searches.
In my tests, at least, storing the hash was a clear
winner. Even when a search visited twice as many entries,
it almost never called .equals() in an unsuccessful search
or called it more than once in a successful search (when at
least one .equals() call is unavoidable). The pointers-only
table visited fewer entries, but had to call .equals() on
every single one of them, every time. Pointers-with-hashes
was so much faster that the memory investment to store the
hash values was clearly worth while.
</topicality>
Cool. I wonder, does String.equals() make the same optimization?
(Looks....) As of 1.5, it does not. It checks the length, and if
that matches, it immediately begins marching down the array of bytes.
codes unnecessarily.
hash code every times it's requested. I'd probably have defined
hashCode() for such a string to return -1, just to avoid that.
1. Say, one which contains all zero bytes.