Re: A few questions on C++

From:

Kai-Uwe Bux <jkherciueh@gmx.net>

Newsgroups:

comp.lang.c++

Date:

Sun, 23 Sep 2007 15:12:54 -0700

Message-ID:

<fd6odu$ff4$1@murdoch.acc.Virginia.EDU>

James Kanze wrote:

Kai-Uwe Bux wrote:

James Kanze wrote:

On Sep 21, 10:58 am, Kai-Uwe Bux <jkherci...@gmx.net> wrote:

James Kanze wrote:

On Sep 19, 3:10 pm, "Phlip" <phlip...@yahoo.com> wrote:

D. Susman wrote:

[snip]

2)Should one check a pointer for NULL before deleting it?

No, you should use a smart pointer that wraps all such checks
up for you.

Why? What does a smart pointer buy you, if all it does is an
unnecessary test?

Don't forget, too, that most delete's are in fact "delete this".
And "this" cannot be a smart pointer.

Are you serious?

Yes. Most (not all) objects are either values or entity
objects. Value objects aren't normally allocated dynamically,
so the question doesn't occur. And entity objects usually (but
not always) manage their own lifetime.

Most of my dynamically allocated objects are used to implement container
like classes (like a matrix class), wrappers like tr1::function, or other
classes providing value semantics on the outside, but where the value is
encoded in something like a decorated graph.

The internally allocated nodes do not manage their own lifetime: they are
owned by the ambient container/wrapper/graph.

That is one of the cases where "delete this" would not be used.
But it accounts for how many delete's, in all? (Of course, in a
numerics application, there might not be any "entity" objects,
in the classical sense, and these would be the only delete's,
even if they aren't very numerous.)

It's not just numerics. But numerics applications are definitely a very good
example of what I had in mind. I think that a lot of scientific computing
looks like this.

I venture the conjecture that this heavily depends on your
code base and on the problem domain.

And your style. If you're trying to write Java in C++, and
dynamically allocating value objects, then it obviously won't be
true.

I have no idea about Java. My code is heavily template based,
uses value semantics 95% of the time, and new/delete is rather
rare (about one delete in 500 lines of code).

Curious. Not necessarily about the value semantics; if you're
working on numerical applications, that might be the rule. But
templates at the application level? Without export, it's
unmanageable for anything but the smallest project. (The
companies I work for tend to ban them, for a number of reasons.)

Indeed, my programs are _very_ small, something between 50000 and 100000
lines after preprocessing. Moreover, all an application usually does is
read data from stdin, perform some (highly involved and complicated)
computation, and write the results to stdout. Here is an example of a
complete application:

// scx_homology.cc (C) Kai-Uwe Bux [2006]
// ======================================

#include "kubux/sequenceio"
#include "kubux/set_of_set"
#include "kubux/matrix"
#include "kubux/homology"
// #include "kubux/integer"

#include <iostream>
#include <vector>

// typedef kubux::integer Integer;
typedef long Integer;
typedef kubux::matrix< Integer > IntMatrix;
typedef std::vector< IntMatrix > ChainComplex;
typedef kubux::set_of_set< int > SimplicialComplex;
typedef std::vector< kubux::AbelianGroup< Integer > > Homology;

int main ( void ) {
  SimplicialComplex cx;
  while ( std::cin >> cx ) {
    ChainComplex ch =
      kubux::chain_complex< ChainComplex >
        ( kubux::homotopy_simplify( cx ) );
    Homology hom;
    kubux::copy_homology( ch.begin(), ch.end(),
                          std::inserter( hom, hom.begin() ) );
    while ( ( ! hom.empty() )
            && ( hom.back().first == 0 )
            && ( hom.back().second.empty() ) ) {
      hom.pop_back();
    }
    std::cout << hom << '\n';
  }
}

// end of file

As you can see, it' just a trivial filter; and all the real code is in the
library. That, in turn, is templated for flexibility. E.g., the matrix
class is supposed to work just as nicely with infinite precision integers,
and an algorithm picking out the maximal elements (with respect to some
partial order) from a sequence should be generic.

As you have figured, it is somewhat like number crunching (except that I am
dealing more with topological and combinatorial algorithms, so enumerating
all objects of a given size and type is a typical thing that happens in my
code).

Now, with respect to huge applications, I see that templates are an issue.
On the other hand, I thought, that is what nightly builds are for: You have
a bug to fix, you locate it, you add a unit test for the failing component
that displays the bug without using all the unrelated crap from the huge
ambient application; and then you work on that component until it passes
all tests. After a commit to the code base, the huge application is rebuilt
over night and all automatic tests are run. Working on your component in
isolation, you still have short edit-compile-test cycles.

In my codebase, the lifetime of an object is managed
by the creator, not by the object itself.

There are cases where it is appropriate. There are also cases
where the lifetime will be managed by some external entity such
as the transaction manager.

Ownership almost never is transfered. The reason that the
object is created dynamically is, e.g., that its size was
unknown (in the case of an array) or that a client asked for
an entry to be added to a data structure.

O.K. That's a case that is almost always handled by a standard
container in my code. I have entity objects which react to
external events.

If you're writing well designed, idiomatic C++, then a
large percentage of your deletes probably will be "delete this".

I disagree. Could it be that you are thinking of object oriented designs?

More to the point, I'm thinking of commercial applications. I
sort of think you may be right with regards to numerical
applications.

By "commercial", do you mean "software for sale" or "software used in the
sales department" :-)

I agree that programs that act in complex environments and have to respond
to thousand different kind of events will use objects to model the world
they operate in (I am thinking of transactions between banks, simulations,
GUI, games, etc). On the other hand, programs that perform highly
complicated transformations in batch mode are likely to be different. That
would go for most of number crunching, scientific programming, compilers,
symbolic computation, combinatorial optimization, and so on. I expect the
code for those to be more similar to mine than to yours. There are programs
for sale in all these categories (but I would not expect a typical sales
department to make heavy use of a PDE solver).

I think the difference is not commercial versus non-commercial, but more
whether your application is event-driven or has the classical (ancient?)
parse_input....write_output format.

[...]

So here is a question: given that uses cases frequencies can differ
dramatically, can one give rational general advice concerning smart
pointers? and if so, what would that advice be?

The only "rational" advice would be to use them when
appropriate:-). Which depends a lot on context---if you're
using the Boehm collector, for example, you'll probably need
them less than if you aren't. But on the whole, even without
the Boehm collector, I doubt that they'd represent more than
about 10% of your pointers in a well designed application.

That depends on what you count as a smart pointer. E.g.,
tr1::function or boost::any are very close to smart pointers
with copy semantics. However, it clearly does not compete with
pointers.

I'm not sure I agree. I'd be tempted to say that if you can't
dereference it, it isn't a smart pointer.

That's why I said "close". I agree that the term smart pointer should be
reserved for something you can dereference.

STL iterators are
smart pointers because they support dereferencing. Still, in a
commercial application, *most* pointers are used for navigation
between entity objects. You rarely iterate; you recover the
real pointer from the return value of std::map<>::find almost
immediately, etc.

That just says, that you need different smart pointers :-)

Think of a smart pointer that does not interfere with life time but helps
with the typical problems when pointers are used for navigation. E.g., you
can wrap the observer pattern into a smart pointer so that all those
objects that have a handle to a potentially suicidal one get notified just
before it jumps off the cliff.

However, by and large, I also found that (smart) pointers
rarely ever make it into client code. When I put a class in my
library, it usually provides value semantics, and in fact,
most of my classes do not have virtual functions or virtual
destructors.[1] Thus, client code has no reason to use dynamic
allocation.

Are you writing libraries?

Well, that is the way the code organizes naturally. Most of it goes into
libraries that provide an abstract (usually templated) data type or some
transformation from one type to another. The actual applications are
trivial filters.

Obviously, something like
std::vector<> won't use delete this for the memory it manages.
Something that primitive probably won't use a classical smart
pointer, either, but I guess more complex containers might.

I don't really like smart pointers there either. However, they are really
handy in getting a prototype up and running, which is a good thing during
the design phase when you are experimenting with the interface and whip up
the initial test cases. When the design is stabilizing, I tend to first
replace smart pointers (and raw pointers) by pointer wrappers that support
hunting double deletion and memory leaks, and finally by pointer wrappers
that wrap new and delete and provide hooks for an allocator to be specified
by the client code.

In the applications I work on, of course, such low level library
code represents something like 1% or 2% of the total code base.
And for the most part, we don't write it; the standard
containers are sufficient (with wrappers, in general, to provide
a more convenient interface).

They obviously don't apply to entity objects, whose lifetime
must be explicitly managed. And how many other things would you
allocate dynamically?

I forgot to mention one other reason to use T* instead of T: in template
programming, the first is available for incomplete T. For instance, there
are two obvious implementation of the box container (a box can be empty or
contain a single item; I think such a container is sometimes called
fallible or optional). One implementation has a T* data field and the other
has a T data field. The first will work with incomplete T the second won't.
When doing template programming, one has to be aware of the conceptual
requirements created by an implementation approach. Sometimes, that forces
or suggests dynamic allocation.

A whole lot. E.g., very often in math programming, I find myself dealing
with _values_ that are best represented by trees, pairs of trees, trees
with some decoration, or graphs. Implementing those classes requires a
whole lot of dynamic allocation, but in the end that is just some means
to realize a class that has value semantics from the outside. The objects
are then in charge of destroying the internal nodes whose graph structure
encodes the mathematical value of the object. Leaving that to smart
pointers is very helpful in prototyping.

I think that's the difference. I guess you could say that my
code also contains a lot of trees or graphs, but we don't think
of them as such; we consider it navigating between entity
objects---the objects have actual behavior in the business
logic. And the variable sized objects (tables, etc.) are all
handled by standard containers.

Spot on. That is the difference. My objects usually do not have any behavior
at all. They just have values, which can change.

Most of the time I see a lot of smart
pointers, it's for things that shouldn't have been allocated
dynamically to begin with.

I cannot refute that observation. However, that is a function
of the code you are looking at.

Certainly. I also see a lot of code in which there is only one
or two deletes in a million lines of code; the value types are
all copied (and either have fixed size, or contain something
like std::string), and the entity types are managed by a single
object manager. In many cases, the architecture was designed
like this just to avoid "delete this", but the delete request to
the object manager is invoked from the object that is to be
deleted---which means that it's really a delete this as well.

I did not want to argue for or against "delete this". I can see how this
idiom is useful. I was just flabbergasted by your claim that most deletes
are of this form. But now, I can see where you were coming from.

However, it is somewhat funny that "delete this" looks scary enough that
people invent roundabout ways to avoid it.

[snip]

Best

Kai-Uwe Bux