Re: Summating Strings
Hemal Pandya wrote:
Now I am confused.
Mike wrote:
[....]
I'd then like to sort this data so the items with the higher counts come
first in the list.
Neither of the suggested solutions would meet this condition. Or would
they? What am I missing?
Maybe you missed my comment "Once you have the set of keys, you can sort
it using a Comparator that orders based on the associated count."?
You need a collection of (word, frequency). When building the
collection, you want it searchable by the word. When reporting, you
need it sorted by frequency. One way is to sort the collection after
the scan. Would that be worse then maintaining two collections during
scan? I don't know...
A continuously sorted collection would be quite expensive to maintain,
especially considering the statistics stated in the base message: "The
list of source strings can be anywhere from 1 - 20000 strings having
anywhere from 1 - 500 unique terms. The average set will be roughly 300
strings with about 30 unique terms."
Maintaining sorted order requires adjustment of the structure for every
input, 300 times in the average case. A HashMap indexed by string has
one insert for each unique string, 30 times in the average case,
followed by a 30 element sort at the end. The remaining 270 operations
do a HashMap get followed by an add.
The OP did say "I'd then like to sort this data so the items with the
higher counts come first in the list." which suggests to me that
solutions that first collect the data and then sort meet the requirements.
If I remember right, the K&R C Programming Language book has a sample
code for word frequency.
I don't think K&R gave enough attention to the java.util package to be a
really good guide to Java programming in this area. :-)
Patricia