Re: Copy / Paste in software development

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

comp.lang.c++

Date:

Sun, 15 Feb 2009 01:40:08 +0100

Message-ID:

<gn7o9a$9un$1@news.motzarella.org>

* Kai-Uwe Bux:

Alf P. Steinbach wrote:

* Ira Baxter:

[snip]

So, I have an FSA with 3 states:

A: if A1 then action1; goto B
     (otherwise) goto C

B: if B1 then action2; goto A
     if B2 then action3; goto B
    (otherwise) goto C

C: if C1 then ...
    ...

How would you propose that control pass from the otherwise
clauses in both A and B to C, without the goto, and no other overhead?

The goto is a consequence of having separate basic blocks
whose control flow merges, and a linear addressing
space (whether in source lines or machine memory),
and only being able to place *one* of the
basic blocks in front of the other. The other
one has to go somewhere else, and transfer
control to the shared successor.

The Bohm-Jacopini theorem from back in the early
70s says you can always build structured code,
if you don't mind adding flag variables, which I
count as "extra overhead". You may argue that isn't
expensive; I'd argue adding the flags make it not high-performance.
Some applications really care. Yours may not,
but that isn't the point.

Your argument is incredibly silly. Taking it as a given that it's
desirable to keep the sordid mess of of a spaghetti state machine (it's
not),

Whether it is a good idea to keep the mess, depends very much on context. I
find myself often in the position that some of the best algorithms
described in the literature are given in a somewhat messy way.

Then they're decidedly not, in general, the best: if the authors were any good,
they'd not make a mess of it, so any particular one would be best just by chance.

However,
that mess has the _huge_ advantage of being shown correct. In those cases,
I consider it a good strategy to make my code mimmick the paper as close as
possible so that it is easy to check that my code is nothing but a
transcription. Was I to depart from that pattern, I might have to argue
correctness again (which can be highly non-trivial). For instance, a
central piece for generaing Poisson distributed random variables in my
library is the following piece of spaghetti code:

        ...
        fhuge u;
        fhuge g;
        result_type k;
        fhuge e;
        fhuge t;
        fhuge dummy;
        if ( this->m < 10.0 ) {
          return( this->grow_table( uni_d( urng ) ) );
        } else {
        step_N :
          g = nor_d( urng, normal_param( this->m, this->s ) );
          k = floor(g);
          if ( g < 0.0 ) {
            goto step_P;
          }
        step_I :
          if ( fhuge(k) >= this->L ) {
            return ( k );
          }
        step_S:
          u = uni_d( urng );
          dummy = this->m - fhuge(k);
          if ( this->d * u >= dummy * dummy * dummy ) {
            return ( k );
          }
        step_P :
          if ( g >= 0.0 ) {
            this->procedure_F( k );
          } else {
            goto step_E;
          }
        step_Q :
          if ( this->fy * ( 1.0L - u )
               <=
               this->py * exp( this->px - this->fx ) ) {
            return ( k );
          }
        step_E :
          e = exp_d( urng );
          u = uni_d( urng, uniform_param( -1.0L, 1.0L ) );
          t = 1.8L + ( u > 0 ? e : -e );
          if ( t <= -0.6744L ) {
            goto step_E;
          }
          k = floor( this->m + this->s * t );
          this->procedure_F( k );
        step_H :
        if ( this->c * abs(u)
             >
             this->py * exp( this->px + e )
             -
             this->fy * exp( this->fx + e ) ) {
          goto step_E;
        }
        return ( k );
        }
        ...

Nobody is supposed to understand what is going on. However, if you have a
copy of the article

  J.H. Ahrens, U. Dieter: "Computer Generation of Poisson
  Deviates from Modified Normal Distributions", ACM Transactions
  on Mathematical Software 8 (1982) 163-179

you will be able to verify that it follows the document closely.

Hm. In effect you're just hoping for the best, taking some spaghetti code based
on spaghetti mathematics and hoping the authors have done it right at both
levels. Of course it *may happen* that the above is correct with respect to some
specification, but hey.

And no, I really don't care how well-respected or not the authors are.

If the king of Norway visits and spews on my carpet I kick him and yell at him,
king or whatever.

In
particular, names for case labels and variables are chosen with that in
mind. (There is of course a tricky part with this: There can be a typo in
the paper. In this particular case, it is in procedure_F :-)

Ah!

So, you had to correct their work.

I'm not surprised! :-)

Of course, the
file containing the spaghetti code gives the reference (and explains and
corrects the typo).

So, if you happen to have a proof of correctness for the messy FSA, it can
be a good business decision to keep it. Otherwise, you might ending up
wasting resources.

you're arguing that it should in turn be implemented as spaghetti
goto-based code in order to save /one/ integral variable. It's borderline
lunacy.

Well, _if_ (in the rare cases where appropriate) you keep the messy FSA,
then I think you have to deal with control flow jumping all over the place
whether you put in an integral variable or not. I don't see any advantage
to turning a

goto vertex_5;

into

next_vertex = 5;

You mean "current_vertex".

It really helps debugging and tracing, and it really helps separating those
cases into individual routines, which is a good idea when you're the one
creating and implementing the state machine.

For your case of sort of "inheriting" the mess from a pair of messymathicians
I'd just search for some other less messy Poisson distribution generator (e.g.
there is a simple one, just a few lines of Smalltalk, in the book nearest to me
right now).

Cheers & hth.,

- Alf