Re: Patricia trie vs binary search.
On 5/27/12 6:44 PM, Gene Wirchenko wrote:
On Sat, 26 May 2012 17:30:17 -0700, Daniel Pitts
<newsgroup.nospam@virtualinfinity.net> wrote:
[snip]
I tend to use a Deterministic Finite State Automata for this. You can
load the entire English dictionary fairly easily with that scheme. Yes,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
you use a bit of memory, but unless you're doing this on an embedded
device, its probably not enough memory to be concerned about.
Including all affixes?
I suppose it depends on the particular dictionary, but we're only
talking a few hundred thousand entries, at least with the Moby
word-lists as a base:
cat compound-words.txt often-mispelled.txt english-most-frequent.txt
male-names.txt female-names.txt common-names.txt common-dictionary.txt
official-scrabble-* | sort | uniq | wc -lc
388997 4801599
That's just over 4MB of strings, if stored as naively as possible.
Certainly no problem for a modern day computer. It can actually be quite
compressed when converted it to a DFSA, assuming reasonably optimized
implementations.
[snip]
Sincerely,
Gene Wirchenko