Re: simple code performance question
On 5 nov, 17:09, "Bo Persson" <b...@gmb.dk> wrote:
Elias Salom=E3o Helou Neto wrote:
:
: Well, again you forget that your idiom has an implied destruction of
: the object at every loop iteration, resulting in the need to deal
: with exactly the same problem! How could that be different?
:
: I will give you an example. Take the following two simple programs:
:
: //Program 1:
: #include <string>
:
: std::string myFunction()
: {
: std::string str;
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
:
: return( str );
: }
:
: int main()
: {
: for( unsigned i( 0 ); i < 100000; ++i )
: std::string str( myFunction() );
:
: return( 0 );
: }
:
: //Program 2:
: #include <string>
:
: void myFunction( std::string& str )
: {
: str.clear();
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
: }
:
: int main()
: {
: std::string str;
: for( unsigned i( 0 ); i < 100000; ++i )
: myFunction( str );
:
: return( 0 );
: }
:
: According to you, Program 1 should run faster, right? But it is just
: the opposite. Compiling both with no optimization (the default)
: using gcc 4.1.2 20070502 Program 1 takes around 21 seconds to run
: against around 15 seconds for Program 2. Now, let us turn
: optimization to its higher level and see what happens. With the -O3
: flag used when compiling, Program 1's execution time falls to
: around 19 seconds, while Program 2 goes down to amazing 12 seconds!
: Can you explain me that?
Yes, you are benchmarking the memory allocation for std::string.
Well, it is in fact easier to deal with memory allocation once than
doing it in every loop iteration. But, as I said, my example is
contrived.
On my machine, using another compiler, I get:
Program 1: 22.5 s
Program 2: 3.3 s
Then I notice that Program 2 reuses the same internal string buffer
for all calls, saving calls to the string growth code for the last
99,999 calls.
It happens all the time with this idiom.
To even the score a bit, I add a "str.reserve(100000)"
to myFunction.
Program 1B: 3.5 s
Program 2B: 3.4 s
Assuming also that reserving much more memory than needed is not a
problem, yes, it should work, but 2 is still (marginally) faster, it
would be fairer to say as fast as. It is yet to appear someone to show
an opposite example, i.e., where passing an object as reference will
degrade performance (although some claim that it is possible, and I do
believe).
I can imagine extremely contrived examples involving somewhat absurd
classes, but never when the class to which the object belongs allows
efficient manipulation of the data. If std::string did not allow such
manipulations it would be useless, since char[] already existed in C.
In fact, if any class does not provide other means to manipulate its
data than through constructors, why to exist at all if we could have
done well with a C struct? This seems to apply even more to classes
whose instances are supposed to hold large amounts of data.
: It's time for another listing:
:
: //Program 3:
: #include <string>
:
: std::string myFunction()
: {
: std::string str;
: for ( unsigned i( 0 ); i < 1000; ++i )
: str.append( "supercalifragilisomethingidonotremebmberandd"
: "donotwantotsearchintheinternet" );
:
: return( str );
: }
:
: int main()
: {
: std::string str;
: for( unsigned i( 0 ); i < 100000; ++i )
: str = myFunction();
:
: return( 0 );
: }
:
: Program 3 takes little more than 17 seconds to run without
: optimization turned on, explain it to me, please. When optimized, it
: will take around 15 seconds to run.
On my machine it takes 24 s unmodified.
Adding the same "str.reserve(100000)" to myFunction.
Program 3B: 5.6 s
I guess there is no copy on write on your compiler's std::string
implementation, so that assignment to a temporary will actually move
data around (whether this is a good design decision or not, I do not
know), but this would not be needed with your idiom because the
standard allows to optimize away the copy constructor (I am willing to
bet that if you forbid optimization both will be equivalent). Compiled
with gcc, all of your versions run equally fast on my machine
(actually equally slow when compared to your machine) whether
optimized or not. Now I really want to know which compiler you are
using.
Rewriting main, making it equivalent to Program 1:
int main()
{
for( unsigned i( 0 ); i < 100000; ++i )
std::string str = myFunction();
return( 0 );
}
Program 3C: 3.5 s
The last case shows that, in this test, constructing a new string on
each iteration is faster than assigning a new value to an existing
string.
This is just the same than std::string str( myFunction() ). We did not
even needed this case to reach the conclusion, but the dramatic effect
is interesting. Are you a lawyer? Just kidding...
Well it is for your compiler, but what I would really love to know is
why is your idiom so overhauled that no one can realize that passing
the string as a reference (within tight loops, of course) is much less
likely to suffer from performance penalties?
Also, try comparing 1B against 3B forbidding optimization to see what
an non-optimizing compiler may be doing with your idiom. Please, do it
or say which compiler you are using. I am curious.
I argue that, when not optimized, 1B should be equivalent to 3B in
every realistic implementation of std::string. With optimization, 1B
should perform better on some implementations. But for really good
implementations (recent versions of gcc), both should be nearly the
same even with optimization turned on. The conclusion is that 1B has
more chances of being successful, so should be preferred over 3B. But
we can go further and say that 2B is much more likely to beat both in
most cases.
Elias Salom=E3o Helou Neto