Re: Job Interview, Did I Mess Up?

From:

"Daniel T." <daniel_t@earthlink.net>

Newsgroups:

comp.lang.c++.moderated

Date:

Sun, 7 Mar 2010 15:51:19 CST

Message-ID:

<daniel_t-D74DD1.11100907032010@70-3-168-216.pools.spcsdns.net>

brangdon@cix.co.uk (Dave Harris) wrote:

daniel_t@earthlink.net (Daniel T.) wrote (abridged):

However you make an important point. If it were the case that
"tolower(a)" represented something other than simply "lower case a",
if tolower() had side effects for example, then we would be calling
the function, in part, to cause those side effects. If that were the
case, then tolower(a) and lo_a would *not* represent the same thing
and you wouldn't be violating DRY, that isn't the case though.

It sounds as though which version you think is clearer depends on what
tolower() does. That makes your argument a bit suspect, in my opinion.

You absolutely should treat functions that have side effects differently
than functions that don't, and you need to know which are which. If you
find treating them differently "suspect," we are so far different in our
ideas on how to program that discussing function use is pointless.

I find it interesting that on the one hand, I am being lambasted for not
adding extra variables in a function even though what they represent is
already expressed, while on the other (below,) I'm being lambasted for
putting in a variable that actually represents something that isn't
expressed any other way. On top of that, some insist that having
multiple representations of the same piece of information somehow
doesn't break DRY, while having a single representation of a piece of
information does. I'm boggled.

To put it another way, there is no design rule that says "prefer
multiple exits from blocks of code," yet that is what you seem to be
advocating. (If I am wrong here, then we are probably not that far
apart in our positions.)

In my experience we are frequently faced with a choice between three
evils:
1. Adding duplicate code.
2. Adding spurious mutable state.
3. Adding extra multiple exits.

For example,

     bool contains( const int *first, const int *last, int value {
         while (first != last && *first != value)
             ++first;
         return first != last;
     }

this duplicates the test. We can avoid that by adding a variable to
remember the result of the test:

         bool found = false;
         while (first != last && !found)
             found = (*first++ == value);
         return found;

Or we can use multiple exits:

         while (first != last)
             if (*first++ == value)
                 return true;
         return false;

So for me the dispute is over which of these is the lesser evil. I
dislike the first because it is duplicating code and because it is
doing unnecessary work. I dislike the second because it adds mutable
state, and also adds an extra test in the loop. The third sample I
prefer. It seems simplest to me: as soon as you know the answer, stop.
It's simple and efficient. It's only fault is that it offends the
single-exit rule, which is only an issue if you are dogmatic about it.

I imagine you'd prefer the first version. In this case the difference
is trivial because the test we are duplicating is trivial, but that
doesn't make it better; it just means it's worse by a trivial amount.
In real life the test to be duplicated may be more complex and
expensive, which leads us to the version with extra flags.

I like that you presented these three different ways to code an
algorithm and explained how you feel about each one, so I will return
the favor.

If I were required to write a function like your example (I actually did
write one recently,) my instinct would be to write it like example 2. In
fact, when I'm writing a (non-member) function that returns a result, my
first move it to write:

-function signature-
{
-ReturnType- result = -default-;
return result;
}

Then I fill in the guts... I can almost hear the groans from the people
reading this! :-D

I reject the first example because for most debuggers I use, putting a
breakpoint at the return in the function would be useless in trying to
figure out what the function is going to return, and I can't change the
result during a debug run. Generally, this example is harder to grock
than the other two.

I consider example 3 fine, but with it, I have to put two breakpoints in
the function to track what it will return, and I can't change the result
during a debug run. I think of example 3 as an optimization of example
2, I might switch to it if I'm confident that the function works and
there is some slowdown because of the assignment to the result, but
generally I don't bother.

As you say, for this trivial example my approach is kind of silly, I
accept that. A function like this would be almost impossible to write
incorrectly so testing isn't much of an issue. However, imagine
happening across a function that has 2 continues, 3 breaks (with no
switches,) 4 loops, 5 returns and no documentation. Do that enough times
(and it happened to me a lot in my early career,) and you would probably
come to loathe multiple block exits as much as I do.

_Postconditions and Where to Check For Them_

Again we come back to not repeating ourselves. If the post
condition of a function can be checked by code, then it should be.
I think we can agree on that, but where should we do the
checking? Your suggestion was that this checking should be done
"at a caller site (as normally done in unit tests)."

Most functions are complex enough that they require multiple tests,
so they are generally called from multiple sites. If we want
to put the postcondition checking code in only one place (so we
don't repeat ourselves,) then the obvious place to put it is
just before the return within the function. If there is more than
one return, then we still must repeat ourselves; which is the
one thing that we have both agreed we should avoid.

In general, if there is something to be done after the loop regardless of
how it terminates, that is not part of the loop or dependent on the loop,
then it's doubtful that it belongs in the same function at all. This can
apply to post-conditions. For example:

     bool contains( const int *first, const int *last, int value ) {
         assert( first <= last );
         bool result = contains_inner( first, last, value );
         assert( result == (std::find( first, last, value ) != last) );
         return result;
     }

Here you have written two functions (contains and contains_inner) that
have the exact same preconditions and postconditions. Two functions that
do exactly the same thing, with the same algorithm, in the same program
doesn't sound like a good idea to me.

Such an idiom makes some sense when dealing with virtual
member-functions (the private virtual idiom,) but not for global
functions.

In a language like C, without exceptions, this pattern works well for
freeing resources:

     bool process( const char *filename ) {
         FILE *fp = fopen( filename, "r" );
         if (!fp)
             return false;
         bool result = process_inner( fp );
         fclose( fp );
         return result;
     }

It's an example of separation of concerns, putting the open/close in one
place and the real work in another. A benefit is that we can reuse
process_inner(), for example passing stdin.

However, note that in this case "process" doesn't do the same thing as
"process_inner" unlike your previous example.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]