Re: any performance difference?

From:

"Bruno van Dooren" <bruno_nos_pam_van_dooren@hotmail.com>

Newsgroups:

microsoft.public.vc.language

Date:

Mon, 11 Jun 2007 13:07:34 +0200

Message-ID:

<eivt5iBrHHA.1208@TK2MSFTNGP02.phx.gbl>

There is only 1 answer to questions like this and it is 'run a test'

I appreciate the answer, but as I told, the problem is: run WHERE :)
This code runs under Win32, Win64, Solaris, MacOS and some flavors of
Linux... Each platform has at least a couple of processors (e.g. Mac
Intel/PPC) and a couple of compilers, so you quickly deduce the number of
tests is exponential...
What we need (at the beginning) is some abstract remark, for example: is
it true that calling a function via a pointer improves the branch
prediction?

I have no idea. Branch prediction is CPU specific, so you'd have to answer
this question for each platform your program runs on.
Another problem is that each platform has its own compiler, which performs
different optimizations.

A couple of years ago, I and a couple of colleagues entered a coding
challenge where you got a buffer with consecutive 7 bit characters, so the
first bit of char 1 was the bit right after bit 6 of char 0.
The challenge was to calculate the parity bit of a very large buffer in as
short a time as possible. code size or memory use was not evaluated.
We decided we wanted to win 1st prize so we spent a lot of time on it. The
problem was that once you have made an efficient algorithm, each
optimization becomes CPU dependant.
I went for a lookup based approach and my algorithm was blazing fast on a
PIII with it's superior cache, but it sucked bigtime on a P4 which has a
sucky cache and memory interface.

My colleagues went for a calculation based approach which outperformed mine
with a factor of 50% on a P4, but was slow as molasses on my PIII laptop.

In the end we were beat by one of our customers who has a very good
relationship with the vendor that organized the challenge. He knew the excat
type of machine that was used by the judging committee, confiscated that
same machine somewhere within his company and optimized for that machine.

Moral of the story:
a) there is no good way to optimized across all possible platforms.
b) the only way to know what you will get for performance is to test.

unless these constructs are in tight loops that are executed millions of
times in succession

yes, that's exactly the case: every clock cycle is worth.

I appreciate your problem and I would really like to help you, but you
cannot micro optimize a generic peice of code, and then hope that you will
squeeze every last bit of performance out of the code that is compiled with
different compilers and runs on different platforms.

To get a feel, create 2 testcases in plain C++. your product is cross
platform anyway.
compile for the different platforms and look at the results.
if 1 approach always outperforms the other, go for it.
if not then decide which is most worthwhile.

most of these proof of concept tests can be done in a single day and will
give you a solid basis to make decisions on.

I have been in a performance critical situation before where the performance
was part of the requirments (image processing on sattelite data).
I tried several methods for optimizing my algorithms, but in the end I used
the production system for benchmarking test cases, and optimized for the
performance on that machine with those specific processors, cache and
memory.

--
Kind regards,
    Bruno van Dooren MVP - VC++
    http://msmvps.com/blogs/vanDooren
    bruno_nos_pam_van_dooren@hotmail.com