Re: Is a 1000x speed gain possible here?
On Oct 8, 10:12 am, Sid <siddharth.he...@gmail.com> wrote:
I am writing an article and want to be sure of some numbers before I
send the article out. The difference in the run time numbers between
the for loop and the template method seem to be a little unrealistic
to me. I did expect a gain in speed but not by such a big amount.
Is there something more coming into play here, like loop prediction on
the processor or something. In that case I'd expect it to work better
on the for loop.
I've reported the timing for each method, below each method's code.
//###### Sample code 1
// Begin timing code
int SomeData, nPos, nFind ;
// Initialize SomeData with random values
// Start timer 1
for (int i=0; i<1000000; ++i)
nFind = -1 ;
for (int j=0; j<200; ++j)
if (nFind == SomeData[j])
nFind = i ;
// End timing code
// Best case here 0.00502243 seconds
// Worst case 0.297686 seconds
//###### Sample code 2
[Same program implemented with templates]
Neither version of the program above has any "observable behavior". So
neither of the programs compiled from either set of sources - appears
to be doing anything when it runs. Except - the program compiled from
non-template source code, appears to do nothing even as it iterates a
loop one million times. Whereas the program built from template code
appears to do nothing - well, by actually doing no work at all. Now,
based on the reported timings, a program that does nothing by doing
nothing - can run up to 20,000 times faster than a program that
appears to do nothing by iterating a loop one million times.
A C++ compiler is required to produce a program with the same
"observable behavior" as a program that executes its source code
faithfully. In this case the program as written has no observable
behavior. So, even though a program that appears to perform one
million iterations in one millionth of a second may invite some
skepticism - there is no way for anyone observing the output of the
program to prove that the program did not execute the loop the one
million times as instructed. Of course, having the program print a
message, say every 100,000 iterations would force the program to
perform the loop - because someone might be watching.
Unfortunately very few "real-world" C++ programs can be optimized away
in their entirety (although I would be happy to nominate a few that I
think should be eligible). Therefore - as an example of the
performance gains likely to be realized from using templates - this
comparison has little value. But as an example of how templates often
lead to better optimizations, then this comparison is quite useful.
The use of templates effectively allowed the compiler to see more of
the program at one time, and this "bigger picture" led to the
conclusion that the for loop was pointless. Being able to reach this
same conclusion in the non template code would be much more difficult.
Most C++ compilers do not yet perform intra-function or "whole
program" analysis that I imagine would have been needed to optimize
the non-template version.
I'm using the QueryPerformanceCounter method to get time.
Please let me know if there is any mistake I've made with timing each
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]