Re: simple code performance question
Elias Salom?o Helou Neto wrote:
: On 5 nov, 17:09, "Bo Persson" <b...@gmb.dk> wrote:
:: Elias Salom?o Helou Neto wrote:
:::
::: According to you, Program 1 should run faster, right? But it is
::: just the opposite. Compiling both with no optimization (the
::: default) using gcc 4.1.2 20070502 Program 1 takes around 21
::: seconds to run against around 15 seconds for Program 2. Now, let
::: us turn optimization to its higher level and see what happens.
::: With the -O3 flag used when compiling, Program 1's execution time
::: falls to around 19 seconds, while Program 2 goes down to amazing
::: 12 seconds! Can you explain me that?
::
:: Yes, you are benchmarking the memory allocation for std::string.
:
: Well, it is in fact easier to deal with memory allocation once than
: doing it in every loop iteration. But, as I said, my example is
: contrived.
:
:: On my machine, using another compiler, I get:
::
:: Program 1: 22.5 s
:: Program 2: 3.3 s
::
:: Then I notice that Program 2 reuses the same internal string buffer
:: for all calls, saving calls to the string growth code for the last
:: 99,999 calls.
:
: It happens all the time with this idiom.
The benefit is exaggerated by teh fact that the string is the same
size for every call. Otherwise there would be reallocations here too.
:
:: To even the score a bit, I add a "str.reserve(100000)"
:: to myFunction.
::
:: Program 1B: 3.5 s
:: Program 2B: 3.4 s
:
: Assuming also that reserving much more memory than needed is not a
: problem, yes, it should work,
It's not *much* more memory that needed, I just allocate enough to
hold 1000 appends of about a 100 characters each. (74 is it, if
counting?)
: but 2 is still (marginally) faster, it
: would be fairer to say as fast as. It is yet to appear someone to
: show an opposite example, i.e., where passing an object as
: reference will degrade performance (although some claim that it is
: possible, and I do believe).
Being 0.1 s faster per 100,000 iterations is very marginally faster in
my book. :-)
:
::: It's time for another listing:
:::
::: //Program 3:
::: #include <string>
:::
::: std::string myFunction()
::: {
::: std::string str;
::: for ( unsigned i( 0 ); i < 1000; ++i )
::: str.append( "supercalifragilisomethingidonotremebmberandd"
::: "donotwantotsearchintheinternet" );
:::
::: return( str );
::: }
:::
::: int main()
::: {
::: std::string str;
::: for( unsigned i( 0 ); i < 100000; ++i )
::: str = myFunction();
:::
::: return( 0 );
::: }
:::
::: Program 3 takes little more than 17 seconds to run without
::: optimization turned on, explain it to me, please. When optimized,
::: it will take around 15 seconds to run.
::
:: On my machine it takes 24 s unmodified.
:: Adding the same "str.reserve(100000)" to myFunction.
:: Program 3B: 5.6 s
:
: I guess there is no copy on write on your compiler's std::string
Right.
: implementation, so that assignment to a temporary will actually move
: data around (whether this is a good design decision or not, I do not
: know), but this would not be needed with your idiom because the
: standard allows to optimize away the copy constructor (I am willing
: to bet that if you forbid optimization both will be equivalent).
I don't find it very interesting to compare the speed of unoptimized
compiles. If I want the code to be fast, I use a good compiler with
appropriate settings. If I don't care (or need) the speed, it doesn't
really matter.
: Compiled with gcc, all of your versions run equally fast on my
: machine (actually equally slow when compared to your machine)
: whether optimized or not. Now I really want to know which compiler
: you are using.
It's the other free compiler, Visual C++ 2005 Express (using an
alternate version of the standard library).
:
: Well it is for your compiler, but what I would really love to know
: is why is your idiom so overhauled that no one can realize that
: passing the string as a reference (within tight loops, of course)
: is much less likely to suffer from performance penalties?
The argument was the other way around, that constructing a string
inside the loop was not killing performance.
:
: Also, try comparing 1B against 3B forbidding optimization to see
: what an non-optimizing compiler may be doing with your idiom.
: Please, do it or say which compiler you are using. I am curious.
Ok, without optimization (debug build) we get
Program 1B: 95 s
Program 2B: 87 s
Prorgam 3B: 96 s
From earlier experiments I believe that the main effect here is from
disabled inlining. Actually having to call a lot of accessor functions
out-of-line, seems to cost between 10 and 100 times as much in my
code.
Bo Persson