Re: improving the performance of a large switch block
Mycroft Holmes wrote:
Hi to all,
after some profiling, we found that a hotspot in our code is a big object
factory function, which basically is a large (?) switch block with - say -
70 cases.
How much work does each case do? If it's non-trivial (e.g. a dynamic
memory allocation or similar is involved as you'd expect for a factory),
I'm surprised that the switch statement is taking a noticeable time.
Each case label is a large 64-bit random constant, which is generated by the
compiler via some template metaprogramming trick, so the values are
unpredictable, and thus in particular, the case labels are not sorted.
we *guess* that the compiler is generating code as a long "if-then-else if"
chain, because the switch execution time varies a lot, even if roughly all
the cases have a similar workload.
we made a couple of experiments to understand if it's possible to improve
the performance of the switch, in particular we tried the following:
1) create a global array of pairs (CODE[i], i)
2) sort once the array by code
3) before the switch, call std::lower_bound to find the code in the array,
return the SECOND of the pair.
4) switch on the second member
this would have the advantage of having a switch block with compact and
sorted case labels (say 0...69).
One would assume that the compiler will produce a jump table for that
(your pair.second and 0-69 are of type int I hope, to remove any
potential problems with the compiler not optimizing 64-bit switch
statements as well as 32-bit ones).
however the performance is STILL variable, and on the average, clearly
worse.
std::lower_bound obviously varies in runtime, but I'm quite surprised
you can measure the variation for such a small set of numbers.
we are inclined to trust the compiler and we hope it's able to squeeze
switch blocks far better than us...
is there something else we can do? did anyone perform similar experiments?
Have you tried dropping the switch entirely, replacing the switch blocks
of code with functions, and using a map of CODE -> function (e.g. sorted
array as you have tried, std::map or hash_map)? You loose inlining then,
of course, but performance might be better.
Tom