Re: Techniques to avoid minimal scope inefficiency with complex

From:

brangdon@cix.compulink.co.uk (Dave Harris)

Newsgroups:

comp.lang.c++.moderated

Date:

Sat, 9 Jun 2012 11:36:03 -0700 (PDT)

Message-ID:

<memo.20120609155222.2708B@cix.co.uk>

0xCDCDCDCD@gmx.at (Martin B.) wrote (abridged):

I really asked this question because I though there might be a
technical, elegant solution out of this, but since noone posted
anything that I found particularly enlightening(*), I guess there
really isn't a technical solution to this, but it has to be
addressed on the pedagogical and psychological level :-)

Well, move semantics should help. Consider some straight-forward
code for expressing what you are doing:

    for (int i = 0; i != n; ++i)
        set_lower_text( i, to_lower( get_text( i ) ) );

I hope this passes your guidelines. Suppose the signatures are:

    string get_text( int i );
    string to_lower( string &&s );
    void set_lower_text( int i, &&s );

So get_text() allocates and returns a copy. This is passed to to_lower() as an
r-value reference, so it can modify it in-place and return it without copying the
characters or doing an allocation. Set_lower_text() receives another r-value
reference, which it can move to its final destination, again without allocating or
copying. So we have one allocation and one copy, both in get_text(). If instead the
signatures are:

    const string &get_text( int i );
    string to_lower( const string &s );
    void set_lower_text( int i, &&s );

then the count is the same but now the allocation and copy happens inside
to_lower(). We can provide both overloads of to_lower().

Now compare with your optimised code:

       string s;
       for (int i=0; i!=n; ++i) {
         get_text(i, s); // void get_text(int, string&);
         to_lower(s);
         set_lower_text(i, s);
       }

with signatures:

    void get_text( int i, string &s );
    void to_lower( string &s );
    void set_lower_text( int i, const string &s );

Your version necessarily copies the string in get_text(), and again in
set_lower_text(). Set_lower_text() will probably have to do an allocation. (If
instead it takes an r-value reference and pilfers its argument's memory, s is left
empty and we gain nothing by moving it outside the loop.) So we have 1 allocation
and 2 copies. You've turned two lines of code into five and not really gained
anything over the plain version. You might have saved an allocation, but you've
definitely copied the characters an extra time, so if the allocator is efficient
(as I've argued elsewhere) you might even be slower.

Just write plain, simple code in C++11 and it will usually be fine.
If you can use iterators, so much the better:

    std::transform( begin(), end(), lower_begin(), to_lower );

What's not to like?

If it really matters, I would look more at what get_text() and
set_lower_text() are actually doing. For example, supposing
set_lower_text() actually stores its argument in a vector called
lower_text:

     void set_lower_text( int i, const string &s ) {
         lower_text[i] = s;
         // Enforce invariants here, if any.
     }

You might consider using in-place modification explicitly. Add:

     void set_lower_text( int i,
             const std::function<void(string &)> fn ) {
         fn( lower_text[i] );
         // Enforce invariants here, if any.
     }

so the caller code looks like:

    for (int i = 0; i != n; ++i)
        set_lower_text( i, [=]( string &lower_text ) {
            get_text( i, lower_text );
            to_lower( lower_text );
        } );

This avoids the need for an intermediate buffer entirely. The work is
done directly in lower_text's own storage. If you can encapsulate the
loop, too, so much the better. You could present it as a kind-of
in-place transform:

    template<class Iterator, class Func>
    void set_all_lower_text( Iterator i, Func f ) {
        for (auto j = lower_text.begin(); j != lower_text.end();
                ++j, ++i)
            f( *j, *i );
            // Enforce invariants here...
        }
        // ... or here.
    }

Used like:

    set_all_lower_text( text_cbegin(),
      []( string &lower_text, const string &text ) {
            lower_text = text;
            to_lower( lower_text );
        } );

This version allows more control and economies of scale. For example,
if lower_text must be kept sorted, you can resort it once at the end,
instead of repeatedly after each string is changed. If you need to
notify listeners when the strings are changed, you can do so once
instead of once per string. If you need to lock the data against
multi-threaded access, you can do so with less overhead.

The key thing is for the lambda expression to express what we want to
do to each string in its purest and most efficient form, and then use
that with a custom algorithm that manages everything else. Separation
of concerns.

The reason I don't think this is a good solution for your particular
case is because, without measurements, it's premature optimisation
again. I don't advocate that every signature like:

    void set_lower_text( int i, const string &s );

also have an in-place version like:

    void set_lower_text( int i,
            const std::function<void(string &)> fn );

However, where there is a proven need for performance, the in-place
version may be worth considering. And just adding the r-value
reference version:

    void set_lower_text( int i, string &&s ) {
        lower_text[i] = std::move( s );
        // Enforce invariants here, if any.
    }

gains efficiency without disrupting caller code at all.

(I have reordered these quotes for clarity.)

Of course, iff I'm going to write an algorithm, it's gonna be
as efficient as it gets without contorting myself.

BUT -- that's really missing the point I think: The examples I
gave in the OP were just that: examples. People are *going to*
write loops that don't make much sense as algorithms (-> used
once).

We may be saying the same thing here. Perhaps set_all_lower_text() is
what you had in mind for the efficient algorithm.

On the other hand, perhaps you are saying that anything used only
once shouldn't be an algorithm? If so, then I disagree. Even if
set_all_lower_text() is only used in one place, it's still providing
benefits like abstraction, information hiding, de-coupling and
separation of concerns. We have three components here - the source
of get_text(), the destination for set_lower_text(), and the code
that transfers between them - and none of them needs to know
implementation details of the other two.

I would further say that thinking in algorithms often leads to code
that is both elegant and efficient. Elegant because it is more
abstract and higher level. Efficient because modern C++ is all
about zero-cost abstractions, and clean code is easier to optimise.

Where-as a signature like:
void get_text( int i, string &result );

leads to inelegant code. It uses a side-effect to return its result,
so it can't be composed in a functional way. It forces the string to
be copied; a copy that can't be eliminated. It uses an integer index
instead of iterators so it doesn't place nicely with standard
algorithms. It forces you to use five or six lines of code where only
one or two should be needed.

And summed up over the codebase, there will be a lot of these
(different) loops. And some (few?) of these loops will mess up
performance in one (minor) way or another.

And then I will have to mess around with the guidelines

How does my code break your guidelines?

because some took them too literally and now think they have to
micro-optimize everywhere.

This does sound like pedagogical problem.

-- Dave Harris, Nottingham, UK.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]