Re: Why is there no input value optimization?

From:

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Tue, 10 Apr 2012 11:14:08 -0700 (PDT)

Message-ID:

<jm1ssq$pq4$1@dont-email.me>

On 10.04.2012 11:18, rossmpublic@gmail.com wrote:

I have a very simple question that I have been unable to find a
satisfactory answer. The question is why do I need to manually
optimize my functions using const references?

For example:

// Optimized passing of string parameter
Widget(std::string const& name);
SetName(std::string const& name);

// Non-optimized passing of string parameter
Widget(std::string name);
SetName(std::string name);

I understand that with the latter notation there is an additional
copy involved on most compilers, but why is that exactly?
Why is it that the compiler (as smart as it is today) is unable to
optimize away the additional copy?

The compiler is able to do it within the current language, but it's
constrained by

  * the problem of aliasing, i.e. correctness violation, and

  * depending on the solution, a combinatorial explosion, and that

  * depending on the solution, the linker must support the scheme.

A partial solution, to avoid all three of those problems, is to use
immutable types with reference semantics.

Also, for the particular case of `std::string` another partial solution
is to use a COW (Copy On Write) implementation, which I believe is still
how the implementation for g++ works. It works in spite of the severe
shortcomings of the `std::string` class that theoretically should foil
its positive effect. It's like the magic of the horse shoe the Niels
Bohr had over his desk: theoretically it shouldn't work, but as Niels
remarked, "I am scarcely likely to believe in such foolish nonsense.
However, I am told that a horseshoe will bring you good luck whether you
believe in it or not".

   ---

Here is an example of aliasing at work:

<code>
#include <iostream>
#include <string>
using namespace std;

void spoiler();

int ageOf( string name )
{
    return 0?0
        : name=="john"? 18
        : name=="mary"? 22
        : 0;
}

void foo( string const& name )
{
    int const age = ageOf( name );
    spoiler();
    cout << name << " is " << age << " years old." << endl;
}

string bah = "john";

void spoiler() { bah = "the universe"; }

int main()
{
    foo( bah );
}
</code>

   ---

In many if not most cases, however, the compiler can easily prove that
there is no possible aliasing. It can also emit information that makes
it better able to prove that for later compilations of other code. But
here's where both the combinatorial explosion and problem of possible
need for linker support, enter the picture.

For consider a function declared like

  void foo( string s );

If that function is non-optimized, then machine code must be emitted to
/copy/ the actual argument, while if the function is optimized like ...

  void foo( string const& s );

then machine code to pass an address must be generated.

Consider then that this binary choice is present for each sufficiently
large argument where the optimization is relevant, and so that with n
such arguments we're talking about 2^n implementation variants: a
/combinatorial explosion/ akin to the one for perfect forwarding.

With the now most popular compilation model of C++ the compiler can't
know which variant it should assume, if there is only one. One possible
solution is to assume that /all/ 2^n variants exist, and to use all of
them freely with different linkage level name mangling. But then the
linker has to remove all the unused function implementations, lest the
final program increase greatly in size, like generally almost doubling
in size (which might counter any positive effect).

   ---

Another possible solution, one that avoids both the combinatorial
explosion and the need for linker support, can be based on David
Wheeler's well known aphorism, "Any problem in computer science can be
solved by another level of indirection".

Since the reference optimization only makes sense for sufficiently large
arguments that anyway are handled via pointers/addresses, the caller can
simply, for each argument, pass a flag, e.g. in a processor register,
that tells the implementation /whether to copy/ that argument. If, in a
particular call, a particular argument is so flagged and is not of
primitive type, then the implementation must copy it and update its
pointer to point at the copy. Then it can just proceed normally.

This set of flags imposes a slight overhead on every call, in that the
implementation must check the flags, but it removes the need for linker
support.

Perhaps, in order to let the programmer decide, functions that support
and need the flags could be marked with some attribute.

And even further, perhaps calls could also be annotated so that the
programmer could take responsibility for the arguments being non-aliased.

Why should I be writing optimization code into my interfaces?? This
seems very wrong to me.

From a purely idealistic point of view it is indeed very wrong to

hardcode optimization decisions into interfaces. Ideally there should be
"in", "out" and "in-out" designators as in Ada, and ideally the language
should then support proper Liskov substitution[1]. I.e., supporting
covariant "out" arguments, contravariant "in"-arguments, and enforcing
invariant "in-out" arguments.

C# is one stop closer to that ideal than C++, by having an "out"
designator, but I'm not sure if it supports proper LSP for "out".

However, C++ is very much a language that's evolved to meet practical
needs. And apparently "in", "out" and "in-out" arguments have not been
very urgent practical needs. For if they had been, then they would
presumably have been supported already (of course this argument applies
to any desired feature, but I'm just sayin').

    ---

It is possible to attain the /appearance/ of "in", "out" and "in-out"
support by using e.g. empty macros with suggestive names.

That pure appearance effect is apparently the main idea of
Microsoft's[2] "Standard Annotation Language" SAL wrt. the C++
programming language (for the C programming language the annotations
may however have some slight advantage, but at an extreme, mind-boggling
cost). For example, the last time I checked the SAL annotations for one
of the most used Windows API functions, MessageBox, they were still
wrong. Which is what one can expect when it's just comment-like
annotations, and not a language-supported feature checked by a compiler.

In my humble opinion such schemes, for C++, are worse than not having
the desired feature. It's just a lot of extra work. And as in the case
of Windows' MessageBox function, the annotations can mislead you.

[snip]

Cheers & hth.,

- Alf

Notes:
[1] See e.g. <url:
http://alfps.wordpress.com/2012/03/11/liskovs-substitution-principle-in-c/>
[2] Lest it appears that I'm bashing Microsoft here, no, that's not my
intention, but firstly, "SAL" is the only such annotation language that
I know of (i.e., I don't know very much!), and secondly, I'm a Microsoft
MVP, which if anything should make me biased pro Microsoft.

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]