Re: SortedMap question?
On 11/25/2012 7:36 PM, Arne Vajh??j wrote:
On 11/25/2012 7:25 PM, Knute Johnson wrote:
On 11/25/2012 09:47, Martin Gregorie wrote:
Or, if the key comparison is cheap, i.e. short strings or integer
comparisons, and the key population is very large TreeMap can become
attractive: with 8 million nodes you only need 23 comparisions to find a
miss. Growing the tree isn't too expensive either: rebalancing a
balanced
red-black tree, which is necessary from time to time as it grows, is
only
a matter of adjusting pointers.
That's what I thought, that the searching for a match would be very
quick once the data was in the TreeMap. The test code I wrote may not
have been any good. I created a map with Strings for the keys and
values. I could get about 2 million entries before I started running
into virtual memory issues slowing things down. I searched for
non-existent elements. Using both a HashMap and a TreeMap, the TreeMap
was significantly slower than the HashMap. I even tried a
ConcurrentHashMap and multiple threads to do the search to see if that
would speed things up. While that technique was better than TreeMap it
was still slower than a plain HashMap.
So for the actual case that I am working on, I have a collection of
about 5000 elements and am using an Integer as the key. Data is rarely
changed but often accessed. There should never be a case where the
searched for key will not exist. What would you use, a HashMap or a
TreeMap?
If it is important then you should measure!
Aaay-men!
But I would still expect HashMap to be faster than TreeMap.
Yes. A HashMap with 5000 keys and the default 0.75 load
factor will want 5000/0.75 ~= 6667 buckets, which it will round
up to 8192. With normal luck the 5000 keys will land in ~3743
buckets, about two-thirds of which will hold just one key. So
a typical search involves a few arithmetic operations to locate
the bucket, then about 5000/3743 ~= 1.3 Integer#equals() calls.
Searching a TreeMap of 5000 keys, on the other hand, will
use about lg(5000) ~= 11.3 Integer#compareTo() calls. Uses of
equals() and compareTo() have different costs (equals() needs to
work with Objects and check their classes before comparing, but
compareTo() needs to produce a three-way answer rather than a
simpler boolean ...), but any differences are almost certainly
swamped by the disparity in call counts.
If the key values are not too widely scattered, other
possibilities exist. A BitSet can answer "present or absent"
queries very quickly and without taking much space, or a simple
array can serve as a Map stand-in. But follow Arne's advice
when making your choice: Measure It!
--
Eric Sosman
esosman@comcast-dot-net.invalid