Re: lookup by EnumSet
On 2/28/12 5:55 AM, Roedy Green wrote:
I had a long and annoying dream that there was a Java Collection that
let you look up by EnumSet. It was not a simple Map.
It worked something like this: You could assign a set of binary
attributes to a Person, e.g. male/female, fat, thin, average, atheist,
Christian, Moslem, Jew, Buddhist. Asian, European, African, North
American, South American..
Then you could ask for all the fat or average females, Buddhist but
not Asian.
You might specify an EnumSet for what you want and one for what you
don't want. Anything not specified in either does not matter.
In the dream I was trying to write example code and an entry in the
Java glossary. When I woke, I could not think of such a class, and
further it was not obvious how one could be implemented.
I wondered how you would do it.
I thought you might extract the attributes into an array of longs and
check each one for compliance with your masks.
If the sets were stable, you might extract a BitSet for each
attribute, and do logical operations on giant bit strings of the
relevant bits.
I vaguely recall SQL databases optimising queries of this type by
transparently building inverse look up indexed.
As many have said indirectly, it sounds like an RDBMS problem. For a
small enough data set, you could fit the indexes in memory, either as a
Collection<Person>, or if you're more space conscious a BitSet where the
bit number corresponds to a List<Person> index. would be one approach.
For example, Solr uses such a BitSet of "Document Index" to cache its
Lucene query results.
If you need to find intersections or unions, BitSets are fairly cheap if
your data set is small enough. One technique RDBMS query optimization
algorithms use is to estimate which index-based query would result in
the smallest set, and then iterate through each of those, filtering out
ones that don't match other parts of the query.