Re: "PriorityMap"
Eric Sosman wrote:
Andreas Leitgeb wrote:
I'm in search of a data structure for the followig usecase:
Some algorithm spits out *lots* of pairs (k,v) of type (int,long).
I only want either those ten(*) pairs with the lowest keys, or all
pairs with the lowest ten(*) keys. (whichever is easier to achieve)
The keys (k) are not necessarily unique, in fact I usually get a
couple of different values (v) for each k, and I'm interested only
in the v's, that came with those ten(*) smallest k's. The actual
k's do not matter afterwards.
There are much too many to save them all and filter afterwards.
PriorityQueue and TreeMap do not seem to make it easy to efficiently
prune them beyond the first ten(*) entries after each add().
Since it's called in a long loop, efficiency is an issue, but the
algorithm itself is also non-trivial, so I can afford some
extra cpu-cycles.
(*): I'd prefer if the "ten" does not need to be hardwired.
Since you have "*lots*" of (k,v) pairs, it would probably
be nice to avoid the overhead of creating an Integer and a Long
and a Map.Entry (or similar) for each pair. I'd suggest just
coding the thing up for the purpose at hand -- a simple heap
made out of an int[] and a long[] seems attractive. Should be
pretty fast, too: Once you've been through a few thousand pairs,
most of the remaining pairs will be bigger than the heap's root
and will be rejected on the first comparison.
Personally, I'd try three approaches, and benchmark to pick the best:
a) add every result to a SortedMap (or sorted set) and then do an
iteration to find the first ten.
b) Create a Pair class, keep a List of all the Pairs. Sort and then get
a sublist
c) Create a List, as you add things, utilize Collections.binarySearch,
and insert into the corresponding location, truncating the list if it
becomes too long.
All three of these approaches have different merits, depending on the
structure and amount of data. If I had to bet before testing, I'd put
my money on (c). but it is also the most complicated to code (though not
by much at all).
I'd be curious to see your results.
--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>