Fei Liu wrote:
unsigned int compute_steps(int n){
// shortcut to retrieve memorized steps[n]
if(steps[n])
return steps[n];
if(n == 1) return 1;
if(n%2)
n = 3*n+1;
else
n = n/2;
cout << ' ' << n;
// shortcut to memorize steps[n]
steps[n] = compute_steps(n);
return steps[n] + 1;
}
I presume that the "cout <<" is only there for debug / testing; it will
probably be slower than the rest of the code, so remove it.
Hopefully your compiler will do tail recursion optimisation. But maybe
it won't, for some reason. You might prefer to use explicit iteration
rather than recursion to be on the safe side.
You have two calls to hashmap::operator[] for each value of n. Maybe
the compiler can optimise the second away (CSE), but I would not rely on
that.
Most of the time will be spent in the hash-map operations. You might
try to re-design the algorithm so that there are fewer of them. For
example, you might choose to do the hash-map lookup only once every x
iterations. This means that you will do a few more cycles of the 3n+1
arithmetic than are necessary, but that is balanced against the smaller
number of hash-map lookups. Adjust x until you find a good value.
If n is odd, then 3*n+1 will be even (I think!). So after a 3*n+1 step
you will always do an n/2 step. By doing this explicitly you avoid one
odd-even test:
while (...) {
if (n%2) {
n = 3*n+1;
steps++;
}
n = n/2;
steps++;
...
}
I have a feeling that the algorithm can be unrolled to do more than one
step at a time as follows:
switch (n%4) {
case 0: n = n/4; break;
case 1: n = (3*n+1)/2; break;
case 2: n = 3*(n/2)+1; break;
case 3: n = (3*n+1)/2; break;
}
steps += 2;
I think that you can extend this to process many bits, but it's not
certain that it will actually be any faster. It's worth trying though;
I wrote some code to reverse the bits in a word and used an
8-bits-per-iteration lookup table, which was a lot faster than the naive
code.
Most of your hash-map lookups are actually just to test for the presence
of a value. For this some sort of bit-set or bit-vector might be
preferable. Consider using an int or int64 or larger to store 32 or 64
"steps known" flags, and store these ints in some sort of map.
You might also like to look inside your hashmap implementation and see
how it could be improved.
The performance will be influenced greatly by how the code is used,
especially how high the hit rate is in your hashmap.
Don't forget to use all of your compiler's optimisation settings,
including profile-driven optimisation, and to use profiling to see where
the time is actually spent.
Do let us know how you get on.
Phil.
1. Use iteration instead of recursion to avoid function call
2. Drop boost::filesystem, use simpler fstream interface
3. I like the idea of using an bitarray to represent known/unknown result
4. Prefer unordered_map to hash_map
5. Body of main is too long, split up to smaller pieces
6. Optimize the algorithm