Re: C++ Frequently Questioned Answers

From:

Yossi Kreinin <yossi.kreinin@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Sun, 4 Nov 2007 17:26:39 CST

Message-ID:

<1194215198.606998.7320@k79g2000hse.googlegroups.com>

On Nov 4, 12:45 pm, Alex Shulgin <alex.shul...@gmail.com> wrote:

http://yosefk.com/c++fqa/inheritance-abstract.html#fqa-22.1(abstract
classes)

I think you have severely misunderstood the "Defining interfaces is
hard, throwing together an implementation is easy." part of the
original FAQ item. From my experience, it is really harder to define
interfaces than providing implementation, since making changes to an
interface have much bigger costs than making changes to the
implementation. Designing the interfaces the way you never have the
need to change them, or at least minimizing the changes in future is
what exactly makes it hard (up to the point being harder than
implementing them).

I mentioned this item to show that the FQA refers to abstract classes
and acknowledges their benefits related to separating interface from
implementation with respect to compilation dependencies (again, if
that's all the decoupling you need, incomplete types are a cheaper way
to get it since there's no wrapper code).

As to the "interface is easy / implementation is hard" issue - the FQA
gives examples where the interface is easy and the implementation is
hard. A related example: a compiler. Here, the interface /visible at
the level of class definitions/ is easy
("compile(source_stream,asm_stream)"); the hard part - the grammar and
meaning of the source language and the assemblers of the different
targets - is not visible at the level of classes. There's lots of
structure in what looks like unstructured data to a compiler - images,
text, etc.

If you want each and every interface /inside/ the compiler to be
stable, since you want to never modify them, then you may have a hard
problem. However, the stability of the interfaces inside the compiler
is only important for some of them (for example, the way intermediate
code is represented and the way a "single pass" is represented), other
parts can be easily changed in terms of the extent to which they
affect the rest of the system. On the other hand, the algorithms of
each pass may be pretty complicated.

http://yosefk.com/c++fqa/heap.html#fqa-16.21(pimpl)

Have never used pimpls, so no comments this time. :-)

http://yosefk.com/c++fqa/class.html#fqa-7.5(incomplete types)

This is interesting, and I agree that the original FAQ item is
(possibly intentionally) wrong.

Hope this helps in some practical way :)

"C++ is extremely unsafe because every pointer can be used to modify
every piece of memory from any point in code."

An example (i.e. some real code) would be very helpful.

Consider this snippet (I use iterators to skip the part where we talk
about the problems of C arrays, etc.):

template<class Iter> void inc_range(Iter b, Iter e) {
  for(; b!=e; ++b) {
    *b = *b + 1;
  }
}

Now, b & e could have been obtained with vec.begin() and vec.end(),
and vec could have been resized or cleared since then. inc_range()
will modify objects which happen to be allocated where the vector
storage used to be (or it may crash the program, but the first
possibility is the harder one to debug).

In other languages, this either can't happen or (more frequently)
doesn't happen unless you "work hard" to get the possibility for this
effect to occur (by explicitly "going unchecked"). That is, "every
pointer can /not/ be used to modify every piece of memory from any
point in code" - a piece of code knows that it's supposed to modify an
array and so it checks its length, or that it's supposed to modify a
member aaa of an object of some class CCC, and will refuse to modify
the member bbb of a DDD object (this can happen in C++ with the
example in the next link; this is one case where I'd easily agree that
"seasoned" C++ programmers won't write this code - it's more of a
beginner's pitfall):

http://yosefk.com/c++fqa/web-vs-c++.html#misfeature-1

Now, I realize that run time checks are not always acceptable; I work
on real time apps, among other things. What I claim, based on lots of
experience with debugging "unmanaged" applications, is that people
underestimate the problems with having no checks, and overestimate the
cost of these checks.

In theory, a program that does things like out-of-bounds access is
malformed, so we don't care about it; in practice, /all/ software
shipped to customers or otherwise released to the world is malformed
(think about browsers and operating systems; everything I saw in this
department crashed and had security holes). And you have to debug the
problems, which is much, much easier when a program halts upon the
first violation of language rules compared to the case when it keeps
running and reading/modifying the wrong data, covers its traces by
deleting the objects involved in the error and possibly never crashes.

Agreed, except for the "most" part... How do you know they really
are? ;-)

You mean "most" in "most classes are 'straight' C++ classes without
pimpls/ABCs/incomplete types involved"? Well, if they aren't, this
sort of proves my point about the problems with "straight" C++
classes... Of course I don't have numeric data here, either :) Do you
really think it's wrong though?

http://yosefk.com/c++fqa/defective.html#defect-4

Personally, I don't having recall that sort of problems... But may be
you can clarify on what exactly makes it harder as compared to C
dumps? I can think of inlining and templates...

Inlining - yes, for example, inlining of implicitly-generated
operator= of large classes (I mentioned it in Defective C++ using the
example of a gargantuan class; with reasonably large classes, figuring
out the offset, then the name of the offending member from the huge
disassembly listing all attributed to the single source line where
assignment was done is no picnic, either).

Templates - especially in the context of inlining (people tend to
write all that code inline because it's in the header file anyway - a
"social" problem if you like; I try not to discuss such things too
much in the FQA). Then, debuggers are not very good with things like
placing breakpoints in templates, parsing type names involving
templates in object view windows, etc.

But I meant another thing in that item - you don't know how objects
will look like, from the layout of classes with virtual functions to
the memory layout of std::map. Some debuggers will know how to display
those - unless too much memory is corrupted, in which point you have
to kick in. In C, a (custom) hashtable would look practically the same
in the memory of all targets; the standard C++ types look different
everywhere.

Anyway that's quite
natural and expectable, since the language itself is way more complex
than C.

Ah, but that's my point. If the language is "managed" or mostly safe,
then I don't care about the complexity of internal representations
that much, since I don't get to shovel through them. If it's unsafe
though, I'd rather have it simple enough for me to understand what the
pieces mean - I mean the little pieces to which programs will
invariably break into from time to time :)

Well, it was never a problem for me, and I wish I'd never have this
one. Is sticking to a single compiler a problem to anyone?

Of course it is - sometimes you have compiled third-party libraries,
and sometimes you ship your own libraries to someone using a compiler
which won't compile your C++ code at the front-end level (for example,
I saw pretty simple code using templates, 3 levels below the
complexity of boost, crashing VC++ 2005; gcc 3 and 4 happened to work
with it - that's one reason they bothered to implement mingw; of
course gcc will crash elsewhere, and I saw that, too). Sometimes a
compiler supports things that another one doesn't (platform-specific
intrinsics, special tricks for ISA subsets like floating point, high
quality debug info) but you want to link to libraries compiled with
something else.

Sometimes this is "not a problem", except you have to deliver your own
libraries compiled and tested with 3 compilers, each with its own
front-end (grammar) and back-end (codegen) bugs.

What about when you fill your vector with objects obtained by calling
a method returning a const reference in a loop?

As others have noted it would be extremely nice to have an example for
this one. Let's get to some real code already. :-)

Here's real code from a social network back-end :)

class Lamer { public: const LameComments& getLameComments(); };

void getTheMostLameComments(
  vector<const LameComments*>& comments, //which type would you use?
  const vector<Lamer>& lamers
)
{
  //or we could use iterators...
  for(int i=0; i<(int)lamers.size(); ++i)
  {
    const LameComments& lc = lamers[i].getLameComments();
    if(lc.areTotallyPointless()) {
      //lamers write lots of comments, better copy
      //by reference... Has to be const - can't modify
      //the lamer's precious comments, and we can't
      //have vectors of references, so it's either a dumb
      //const pointer or some non-standard smart pointer
      comments.push_back(&lc);
    }
  }
}

I think this is a pretty common pattern with code massaging data
structures with even minor levels of nesting/indirection.

What about when you need a subset of objects kept in std::vector or
std::list, and the owning collection is about to be deallocated,
because you only need some of the objects, but not all of them
anymore?

http://yosefk.com/c++fqa/dtor.html#fqa-11.1

Another example please? From what I can see it's all about storing
raw pointers in the containers... very suspicious.

Do you know how much time the previous real example took me to invent?
What do you want me to do - grab a complete snapshot of an app I
worked on with memory management bugs in it and post it to Usenet? :)

What's wrong with the "English" example? You iterate over the nodes of
a 3D model, wishing to carve out the interesting ones and dispose of
the model. With garbage collection, the unused parts of the model
would become garbage; with RAII, you'd have to ask the model to
"forget" that it owns the objects. Some "owners" (like std::list) can
do it, some can't. Or you could use reference counting; simple B-rep
3D models are not unlikely to have cyclic references in them.

I've written a (lame) PE executable parser in D a couple of days ago,
where you carve out sections of the byte array; in D, you do it with
slicing, and don't think about the life cycles. With std::vector, you
have to explicitly keep the full vector around so that the references
to sub-chunks won't become dangling references, and you can't use
std::vector to represent the sub-chunks - you need a different,
custom, non-owning container; or you have to copy from the large
vector into smaller ones. It's a really tame example in terms of data
structure complexity, the only good thing about it is that it is real
and I might publish it someday soon :)

Cheers,
Yossi

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]