Re: code review / efficient lookup techniques...
On Nov 13, 5:54 am, "James" <n...@spam.invalid> wrote:
"James Kanze" <james.ka...@gmail.com> wrote in message
news:d6e4e144-d30c-4772-8ca5-aaf70485746e@j4g2000yqe.googlegroups.com...
On Nov 12, 10:30 pm, "James" <n...@spam.invalid> wrote:
"James Kanze" <james.ka...@gmail.com> wrote in message
[...]
Please correct me if I am way off base here but I think the
complexity for the simple and naive algorithm I posted
should be O(8) complexity for insert and lookup operations.
There is no such thing as O(8). The complexity is O(1).
Yikes! I was errounsly counting each diminsion as a seperate
access:
return m_table.m_fp
[g_translator[name[0]]]
[g_translator[name[1]]]
[g_translator[name[2]]]
[g_translator[name[3]]];
Humm... Okay. I count 4 separate accesses into the
g_translator table. I also count a single access into the
m_table.m_fp table. Why would that not be O(5)?
Because there's no such thing as O(5). Big-O ignores any constant
multipliers, so O(5) would be the same thing as O(1).
Okay. So, if an operation take a fixed number of steps to
solve a problem, you can label it as having O(1) complexity
even if the number of steps is a thousand. Am I correcting my
erroneous line of thinking wrt Big-O notation, or screwing it
at all over again?
You've got it this time. All that Big-O does is characterize
the way the complexity increases as the problem becomes bigger.
It doesn't take constant factors into consideration, because in
the end, they depend on the machine and the optimizer. Thus, in
the above, the compiler will generate a certain number of
multiplications, in order to implement the multi-dimensional
indexing. On some machines, particularly older ones,
multiplication was extremely slow, and the above access would be
domintated by the multiplications, rather than the memory
accesses. On a modern machine, the reverse is generally true.
But in either case, the access will take a constant time, even
if that constant isn't known, and varies from one machine to the
next, or one compiler to the next.
(On modern machines, that's not strictly true either. The
access time will vary depending on things like locality, which
the Big-O notation doesn't take into account.)
[...]
Binary search is O(lg n). For small n (less than a couple
of hundred), the difference is not significant. (In the
tests I've done, binary search on a sorted vector or
std::map beats hash coding up to somewhere between 150 and
200 entries. Even though hash coding is O(1) and the binary
search or std::map are O(ln n).
have you tries using std::map as buckets in a static hash map?
struct my_map
{
std::map<...> m_buckets[DEPTH];
};
?
What would that buy you? If you've got more than two or three
entries in a bucket, your hash function is no good; change it,
and not the structure of the hash table.
Well, AFFLICT, the only thing it would buy you is that the
pressure could be taken off a single std::map.
Yes, but if you've implemented and are using the hash table,
there's no point in using the std::map as well. It doesn't buy
you anything.
Which one would be better:
1: a single std::map with 8192000 items in it
2: or 8192 std::maps with only 8192 items in each one
3: 9000000 or 10000000 std::vector, with one or two items in
each, and a linear search in the vector.
This is probably really naive, but I think that distributing
the load across N std::map's should help out when the number
of items gets really huge.
I know I must be missing something here.
Maybe the fact that you already have a hash table, which should
be distributing the load over 9 or 10 million vectors (or maps,
if that's what you're using). And the resulting load is small
enough, or should be, that linear search in a vector or a list
like structure should be faster than accessing into a map.
[...]
Yeah. I intend on filling up the command map with many random
strings, and assigning them to random functions. Something
like:
struct random_string
{
static void generate(char* buf, size_t size)
{
for (size_t i = 0; i < size; ++i)
{
int c;
do
{
c = rand() % (UCHAR_MAX + 1);
}
while (! isalpha(c));
Ouch. Why not something like:
c = "abcdefghijklmnopqrstuvwxyz"[rand() % 26];
and forego the inner loop?
buf[i] = tolower(c);
}
}
};
[...]
Do you think I should use it instead of the 4D array?
I don't know. I don't like hash tables with a fixed number of
buckets. It's not that hard to grow them (unless there are
iterators into them).
I am only using the hash table to take pressure of a single
tree. Is this a boneheaded line of thinking James?
Sort of. Hash tables have a certain complexity. Done
correctly, the allow constant time access. You've paid for the
complexity of the hash table; there's no point in adding in the
complexity of a tree as well. One or the other. (Generally,
I've favored hash tables in the past. Compared to std::map,
they have two disadvantages, however. The first is that you
need a good hash function, or they don't work well, and a na=EFve
user won't necessarily know how to provide such a function. The
second is that std::map is already written, and is part of the
standard, so every other C++ programmer who reads your code
should understand its use; the same isn't true of your hand
written hash table.)
--
James Kanze