Re: C++ Frequently Questioned Answers

From:

Yossi Kreinin <yossi.kreinin@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Sat, 10 Nov 2007 16:36:13 CST

Message-ID:

<1194687542.610667.52280@d55g2000hsg.googlegroups.com>

On Nov 7, 12:38 am, Alex Shulgin <alex.shul...@gmail.com> wrote:

Sorry, I haven't stated my point clear enough: there are cases where
designing an interface is harder (or is of comparable complexity) than
providing implementation, and from my experience they are the most
common cases. Of course, there are simple cases where it is not true
and your FQA shows some of them.

The FQA mentions examples like face recognition and file systems
implementing an existing (and typically quite small and clear) system
call API. These cases are not at all "simple" in terms of
implementation.

As to real world example: suppose you need to design an abstract class
(or class hierarchy) to provide a file system API. I'd expect
concrete methods to be easier to implement since they would be most
probably unrelated to each other (or related only weakly), whereas the
interface(s) are sort of a bigger picture issue--you have to think
thoroughly how things interact and influence each other in order to
design it properly. Add the need to support various different real
world file systems (FAT, ext2, NTFS, etc.) plus yet-to-be-known
systems and the high costs of later changes and you'll see the result.

You're talking about wrapper code: a class library working on top of
existing file systems and providing a uniform interface to them. It's
not an easy task, but arguably /implementing/ a file system (ext2,
NTFS, etc.) well at the system level is much harder. The FQA refers
to /this/ example when it talks about the interface of
write(fd,base,size) being much simpler than the implementation.

I think that lots and lots of cases where the interface is the hard
part are in fact cases of "middle management" code, which propagates
calls and computes nothing (sometimes people really need such code,
frequently they don't though and only do it for syntactic sugar). Of
course there are other cases where the interface is hard as well, I
just think that the opposite case is frequently overlooked by people
with aptitude in "programming linguistics".

Please stop. I hope you are aware we are talking about undefined
behaviour here? If it's all about complaining C++ is too low level
("unmanaged") then I do not see you point--isn't this anyone would
expect?

Sorry if my example was overly verbose, and yes, this is what people
with even basic knowledge of the subject would expect. What I was
trying to do is to explain the exact phrase "every pointer can be used
to modify every object", and to show what undefined behavior is
normally defined to in real implementations.

I'm not saying anything here that's supposed to be new to many people;
my original phrase in the FQA was a response to the FAQ answer about "C
++ helping with the trade-off of safety vs usability".

In theory, a program that does things like out-of-bounds access is
malformed, so we don't care about it; in practice, /all/ software
shipped to customers or otherwise released to the world is malformed
(think about browsers and operating systems; everything I saw in this
department crashed and had security holes). And you have to debug the
problems, which is much, much easier when a program halts upon the
first violation of language rules compared to the case when it keeps
running and reading/modifying the wrong data, covers its traces by
deleting the objects involved in the error and possibly never crashes.

Yep. But in practice, crashes are not the only bugs that creep into
shipped programs and you have to debug them too.

Sure. In many cases this is easier though since the program corrupts
less data, and/or the chances of the problem to reproduce when you run
the program on the same inputs are higher. Of course it's not always
true (the app can wipe out all objects relevant to the fault since
that's what should happen by design anyway). But IMO crashes/data
corruption due to undefined behavior are the number one problem for
complicated applications with serious uptime requirements.

And of course, I'd expect any seasoned programmer to use appropriate
tool for the job, i.e. do not go too low-level where it's
unnecessary. However, if you need to go low-level what language will
you use--C? How is it any different from C++ with respect to crashes
due to UB?

I talk about this in Defective C++. The bottom line is, C is just as
unsafe as C++, however, shoveling through the core dumps is easier,
since the mapping between the source code and the executable program
is much simpler and "debugging-friendlier".

Example: a program crashes in a call to a constructor with a lengthy
initialization list. Many debuggers will attribute all assembly code
generated from this list to a single line. Good luck figuring out
which member misbehaved.

Another example: a program crashes inside STL code. The source/
assembly mapping in STL, having lots of inline template wrapper code,
is totally wedged. Worse, data structures have been overwritten, so
even the good debuggers capable of sensibly displaying STL containers
refuse to do it. Now you have to know how std::map is implemented in
this particular implementation due to an explicit decision to allow
everybody to do this differently for the benefit of the client.

There are lots of other examples.

Yes, these are all implementation issues. What I care about is that C
doesn't have these issues. Everything is an implementation issue, for
example, the fact that I can't program in English is (I'm Turing-
complete and my PC is Turing-complete; there's no real reason for it
not to speak English). What I want is to get things done in finite
time, so I prefer to go for the interface which has a good
implementation. And almost always the quality of actual
implementations is dictated by the interface.

"C++ is extremely unsafe..." as well as C. And some tasks could be
done only in these languages today. So what's your point--do not go
too low-level if unnecessary? Thanks, I believe most of us have
learned this very well already.

Well, lots and lots of people, some of them in adjacent sub-threads,
prefer to use C++ for /everything/. I know people doing throw-away
parsing stuff in C++ and calling the binaries "scripts". Some people /
refuse to take a job/ unless it's guaranteed to them they're going to
do their work in C++. And many people overestimate the overheads of
run time checks, and underestimate the problems with undefined
behavior, until it's too late.

So "not going too low-level if unnecessary" is not at all universally
interpreted the way I (and you, apparently) interpret it.

Nope. I Just suggest using "I believe", "from my experience", "IMHO"
or some other hint to point out that it's your opinion and not common
knowledge or universal truth. ;-)

I agree with the spirit, except then everything will be IMHO'd to
death :) I prefer to implicitly prefix with IMHO everything I say,
except the parts which can be relatively easily tested or proved/
refuted.

Point taken, but it's likely to be more closely related to the quality
of implementation: for example, VS2005 'knows' about standard
containers and show their contents in the debugger in a readable way,
and VS2003 doesn't.

What happens when a container is partially overwritten in VS2005
though? What about when you use STLPort instead of its implementation
(which of course violates the standard since the standard library is
"built into the language"; those STLPort people should be prosecuted)?

The big problems start when the data is corrupted enough for debuggers
to get confused and fail to parse it.

Point taken, however, with the source code at hand it transforms into
portability issue (see below). And no one forces you to use
proprietary third-party libraries, hopefully... :-)

Let me point out that C++ seems to be the only language where library
*interface and design* (not just implementation details) depend on
front-end features tricky enough to be implemented differently by
different compilers, to the extent making it impossible to use the
library on some systems. See the Blitz++ home page, for example.

This isn't really about the ABI-related portability, it's about syntax-
related portability.

Sometimes this is "not a problem", except you have to deliver your own
libraries compiled and tested with 3 compilers, each with its own
front-end (grammar) and back-end (codegen) bugs.

So this is more of a portability issue than the real need to link
against code produced with another compiler.

In these cases, yes. But sometimes you do need to use closed-source
libraries. C++ is supposed to support it; show me one place where
anybody who worked on the development of C++ claims that it is a non-
goal. Well, it is "supported", but not very well.

Here's real code from a social network back-end :)

class Lamer { public: const LameComments& getLameComments(); };

LOL. I hope there is nothing personal, however, could you please quit
that kind of attitude? To this point you have done more than enough
to "prevent people from falling asleep".

Rest assured that there is nothing personal; it was a shot at Web 2.0
posters and non-moderated forums :) I just think that the common kind
of example with "Employee" and "salary" is much more inhumane on many
levels :)

void getTheMostLameComments(
  vector<const LameComments*>& comments, //which type would you use?
  const vector<Lamer>& lamers
)
{
  //or we could use iterators...
  for(int i=0; i<(int)lamers.size(); ++i)
  {
    const LameComments& lc = lamers[i].getLameComments();
    if(lc.areTotallyPointless()) {
      //lamers write lots of comments, better copy
      //by reference... Has to be const - can't modify
      //the lamer's precious comments, and we can't
      //have vectors of references, so it's either a dumb
      //const pointer or some non-standard smart pointer
      comments.push_back(&lc);
    }
  }
}

What's the problem with copying `LameComments' objects? Pardon me my
indent style:

As the comments say, lamers write lots of comments, so LameComments
objects are large.

As I said in the FQA and adjacent threads, copying is one common way
to achieve const correctness; people didn't seem to agree.

void
getTheMostLameComments(vector< LameComments >& comments,
     vector< Lamer > const& lamers)
{
     for (size_t i = 0; i < lamers.size(); ++i)
     {
         LameComments const& lc = lamers[i].getLameComments();
         if (lc.areTotallyPointless())
             comments.push_back(lc);
     }

}

This way you do not need `const*' _and_ have less error-prone code.
Please note, that in your code the caller must ensure that:

1. Lifetime of `comments' vector does not exceed the `lamers' one.
2. Lifetime of every element of `comments' vector does not exceed the
lifetime of a corresponding `lamer' object for which the LameComments
were obtained.

/Of course/ it's more error prone because of lifetime issues! It
wouldn't be in reference-based languages with garbage collection. In C+
+, I'm really supposed to have a smart pointer class and hope that
lamers don't have cyclic references in their comments (pretty
unlikely, that). This is another can of worms, and copying is a good
way out of this one, two, except for the performance penalty. I think
that your solution (copying) is in many cases /better/ than the smart
pointer solution; people in adjacent threads will disagree.

What if at some point you decide to change the container type from
`vector' to `list'? What if you need to work with both types of
containers? See, here you store Lamers in vector, and there--in
list. Would you end up copying elements from list to vector and then
calling the function?

1. I normally /don't/ make these decisions ("change vector to list"),
and over the years I stopped caring about this argument. If the data
structure affects performance, changing it is a royal pain since the
performance of all code touching it will be changed and you'll have to
do lots of rewriting to fix the parts that are now slow; touching all
places where you used random access instead of iterators is the easy
part (and if you typedefed the container and used iterators, many
things will only have to be recompiled). And if the data structure
doesn't affect performance, why change it?

2. Sure, when one place expects a list of Lamers and another one
expects a vector of them, templates are a way out of the problem. But
I wish there was a standard way to use virtual functions to iterate
over different containers instead, so that I could compile the
function code separately from the calling code. It isn't hard to write
something like this on top of STL, but it's non-standard, which in
this case matters since you lose opportunity for binary
interoperability between functions traversing containers and the
calling code, and it's sort of frowned upon in the C++ community since
you're supposed to use templates here (but I can live with the
latter :) )

Do you know how much time the previous real example took me to invent?

So probably this is a sign of how much that example is far from the
real-world? ;-)

C'mon :) It's a trivial filter, of the kind implemented by the Python
generator expression

  (x.m for x in seq if pred(x.m))

It should be /completely trivial/ to implement, since it's the bread
and butter of programming.

As to the nodes of the 3D model example - again, copying has a
performance penalty you don't pay in garbage-collected languages. You
can live with this in practice (you can even live with the speed of
Python in practice, so it's not surprising :) ), and in most contexts
I like this advice more than the smart pointer advice.

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]