Re: Postfix is slower than postfix: how likely?

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.lang.c++.moderated

Date:

13 May 2006 07:10:00 -0400

Message-ID:

<1147453590.093927.314750@v46g2000cwv.googlegroups.com>

Greg Herlihy wrote:

kanze wrote:

Tom?s wrote:

Seungbeom Kim posted:

Now I wonder: how likely are the cases where you suffer
significant performance loss because you use postfix
instead of prefix? Is there any benchmark that tells
postfix can be measurably slower than prefix?

If we're talking about primitive, built-in types, there's
no difference.

It depends on the compiler. And the machine -- in the C
world, postfix became preferred because on the earliest C
compilers for the PDP-11, it was faster than prefix. For
++; the reverse was true for --, but since -- was
considerably rarer...

If we're talking about user-defined types, then it depends
entirely on the class's definition of the operators.

And the compiler, and the machine.

Since the prefix operator "increments" and the postfix
operator "copies, then increments" a value, it should be clear
that the postfix operator will never be faster than the prefix
operator and is much more likely to be slower.

That sort of reasoning is typical of why we say that you don't
trust the programmer's intuition. Actual measurements using
Johnson's pcc on a PDP-11 showed that the postfix operator ++
was significantly faster than the prefix version. (If you know
the machine architecture, it's pretty obvious why.) Of course,
once you did anything in the loop other than increment, the
difference became negligible.

It's commonly said that the prefix is faster than the
postfix,

It's also been pointed out often enough that actual
measurements don't support what is commonly said. That this
is more a myth than anything else.

Unless the compiler can manage to optimize away the extra copy
operation required when postfix-incrementing a value,
prefix-incrementation is certain to be faster.

I'm talking about actual measurements, not wild speculation. I
measured. I know that there is no measurable difference for any
of the standard iterators using g++ on a Sparc. That's a
concrete fact (at least for the version of g++ which I tested --
but I doubt that their optimization has gotten worse since
then). All the rest is simply speculation.

[...]

This discussion has been done to death.

Never the less, Seungbeom is the first beside myself to post
actual facts, as opposed to speculation. To date, no one
has actually shown a real life iterator where it made a
measurable difference.

The speed difference between prefix and postfix incrementing
an iterator can readily be measured with, say, a std::vector
of std::string's:

     #include <iostream>
     #include <vector>
     #include <string>

     using std::vector;
     using std::string;

     void Output(std::vector<std::string>::const_iterator iter);

     int main()
     {
         vector<string> v;

         v.resize(10000, "abc");

         vector<string>::const_iterator iter = v.begin();

         do {
             Output( ++iter);
         } while( iter != v.end());
     }

     void Output(std::vector<string>::const_iterator iter)
     {
         std::cout << *iter;
     }

The user and system time needed to run this program are:

     user 0m0.009s
     sys 0m0.007s

Replacing the do loop with this while loop:

     while (iter != v.end())
     {
         Output( iter++);
     }

takes measurably more time to complete:

     user 0m0.016s
     sys 0m0.026s

You are you trying to fool? Those are two different programs.
And I can't begin to compare them, because the first one core
dumps on my system. Comparing a program which doesn't work with
one that does is easy.

But neither of them test anything we're talking about. Both use
the results -- and both need postfix in order to be correct.
When there's no choice, as in the above code, you use the one
that works. (And of course, given the granularity of time, at
least on my machine, your measures don't mean anything anyway.)
And the reason they need postfix is that you use the results of
the expression; the question is which is, or may be, faster when
you don't use the results.

In other words, due to the extra copying required, the
postfix-incrementing iteration is on the order of 2.5 times as
slow as the same iteration performed with a
prefix-incrementing iterator. A difference in time that should
not be at all surprising.

In other words, a program that core dumps may be faster than one
that doesn't. And in one particular measurement, the jitter in
your clock happened to favorize one.

FWIW: running my benchmarks on a Sun Sparc gives the following
results:

compiled with Sun CC (5.5):
     Testing std::vector, preincrement (1000 iterations, size =
10000)...
         0.014 microseconds per iteration
     Testing std::vector, postincrement (1000 iterations, size =
10000)...
         0.014 microseconds per iteration
     Testing std::deque, preincrement (1000 iterations, size = 10000)...
         0.038 microseconds per iteration
     Testing std::deque, postincrement (1000 iterations, size =
10000)...
         0.036 microseconds per iteration
     Testing std::list, preincrement (1000 iterations, size = 10000)...
         0.036 microseconds per iteration
     Testing std::list, postincrement (1000 iterations, size = 10000)...
         0.035 microseconds per iteration
     Testing std::set, preincrement (1000 iterations, size = 10000)...
         0.131 microseconds per iteration
     Testing std::set, postincrement (1000 iterations, size = 10000)...
         0.131 microseconds per iteration
     Testing std::vector, reverse preincrement (1000 iterations, size =
10000)...
         0.029 microseconds per iteration
     Testing std::vector, reverse postincrement (1000 iterations, size =
10000)...
         0.014 microseconds per iteration
     Testing std::deque, reverse preincrement (1000 iterations, size =
10000)...
         0.057 microseconds per iteration
     Testing std::deque, reverse postincrement (1000 iterations, size =
10000)...
         0.072 microseconds per iteration
     Testing std::list, reverse preincrement (1000 iterations, size =
10000)...
         0.041 microseconds per iteration
     Testing std::list, reverse postincrement (1000 iterations, size =
10000)...
         0.041 microseconds per iteration
     Testing std::set, reverse preincrement (1000 iterations, size =
10000)...
         0.28 microseconds per iteration
     Testing std::set, reverse postincrement (1000 iterations, size =
10000)...
         0.28 microseconds per iteration

compiled with G++ (4.0.1, I think):
     Testing std::vector, preincrement (1000 iterations, size =
10000)...
         0.016 microseconds per iteration
     Testing std::vector, postincrement (1000 iterations, size =
10000)...
         0.015 microseconds per iteration
     Testing std::deque, preincrement (1000 iterations, size = 10000)...
         0.031 microseconds per iteration
     Testing std::deque, postincrement (1000 iterations, size =
10000)...
         0.032 microseconds per iteration
     Testing std::list, preincrement (1000 iterations, size = 10000)...
         0.04 microseconds per iteration
     Testing std::list, postincrement (1000 iterations, size = 10000)...
         0.04 microseconds per iteration
     Testing std::set, preincrement (1000 iterations, size = 10000)...
         0.181 microseconds per iteration
     Testing std::set, postincrement (1000 iterations, size = 10000)...
         0.188 microseconds per iteration
     Testing std::vector, reverse preincrement (1000 iterations, size =
10000)...
         0.018 microseconds per iteration
     Testing std::vector, reverse postincrement (1000 iterations, size =
10000)...
         0.018 microseconds per iteration
     Testing std::deque, reverse preincrement (1000 iterations, size =
10000)...
         0.033 microseconds per iteration
     Testing std::deque, reverse postincrement (1000 iterations, size =
10000)...
         0.033 microseconds per iteration
     Testing std::list, reverse preincrement (1000 iterations, size =
10000)...
         0.047 microseconds per iteration
     Testing std::list, reverse postincrement (1000 iterations, size =
10000)...
         0.047 microseconds per iteration
     Testing std::set, reverse preincrement (1000 iterations, size =
10000)...
         0.329 microseconds per iteration
     Testing std::set, reverse postincrement (1000 iterations, size =
10000)...
         0.346 microseconds per iteration

The loop in each of my tests is:
     for ( typename Container::IterType ## _iterator iter =
myValues.ibegin() ;\
               iter != end ;
       \
               incrIter( iter ) ) {
       \
         myTotal += *iter ;
       \
There is a virtual function call between each call to the loop
to inhibit optimization. According to the macro invocation,
IterType is either const or const_reverse, ibegin/iend either
begin/end or rbegin/rend, and incrIter a macro which expands to
either ++ iter or iter ++. The benchmark harness also runs each
test once before starting measurements, to ensure that it is in
cache (although that's probably not relevant here).

From what I understand, that is what we are discussing. (I

couldn't leave the loop empty, of course, because then the
compiler would suppress it entirely.)

These are actual measurements, using a carefully developed
benchmark harness, designed to prevent optimization outside of
what is being tested. In every case, the differences are
smaller that the typical difference from one run to the next.

And as can plainly be seen, any claims that postfix is slower
than prefix are simply not supported by the data.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
                    Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]