Re: Are throwing default constructors bad style, and if so, why?

From:

Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>

Newsgroups:

comp.lang.c++.moderated

Date:

Tue, 23 Sep 2008 19:15:27 CST

Message-ID:

<K7MsKr.zq@beaver.cs.washington.edu>

David Abrahams wrote:

on Fri Sep 19 2008, Andrei Alexandrescu <SeeWebsiteForEmail-AT-erdani.org> wrote:

I asked the question because I'm coming from a slightly unusual angle.
I've been using a garbage-collected environment for a while. When a GC
is in the equation *together* with deterministic destruction,
Interesting Things(TM) happen. One of them is that it's a great idea to
separate teardown (destruction) from deallocation. For example:

Foo * p = new Foo(...);
...
delete p;

In a GC environment, it makes a lot of sense if the delete statement
only invokes Foo's destructor against *p and then... leaves the object
alone in a "valid but non-resource-consuming" state.

E.g. all bits zero?

The object defines that state. As a simplification, I suggested that
parameterless initialization would be that state. But for that to work,
parameterless initialization should not throw.

Such an approach is not foreign to many on this group; there's been
many a discussion about what was descriptively called "zombie" object
states.

To accomodate "zombie" states you may need to expand the definition of
what constitute "valid" states for the object or the ways zombie-ness
can be reached, weakening the class' invariants.

Again, that only discusses the empty half of the glass. The full part is
that we get rid of dangling pointers. Plus, the way I was thinking of
it, the zombie state would be actually the default-constructed state,
which in many cases does not introduce a new state and does not weaken
invariants.

The advantage of separating teardown from deallocation is that there
are no more dangling pointers,

Not so fast. If you're willing to define a pointer to a zombie as
non-dangling, then yeah, OK. But a pointer to a zombie is distinctly
less safe to use than a pointer to a regular object.

I am willing to define a pointer to a zombie as non-dangling. The zombie
pointer will not have its resource released, and that indeed is a
potential danger.

You also have to be aware of the limits of the guarantees you can get
from that. Sure, the object's immediate bits do not get replaced by
those of some other live object until all the pointers to it are gone
and GC comes along to sweep up the memory. But you can still walk off
either edge of the object and read/write.

and consequently even incorrect programs could achieve predictable,
reproducible, and consequently debuggable behavior.

Provided they eschew pointer arithmetic, yeah. There are probably other
caveats I haven't thought of.

Ah, that brings the discussion on why built-in arrays are superior to
built-in pointers as the sole memory manipulation primitive. I'll save
that for a different time.

If you are going to use GC in this way, you need to decide whether you
want it to be part of the language/system definition or simply an
expression of undefined behavior you can choose (e.g. for your debug
builds). I strongly favor the latter, for several reasons:

1. I don't want to be forced to accept the performance cost of these
checks every time a pointer is dereferenced.

I am not advocating that.

2. If the compiler is generating the checks, I don't want to be forced
   to accept the storage cost of representing a "torn-down" state so
   that it can be detected (you need one more bit for that in general,
   and I doubt the compiler can figure out how to hide that bit
   reliably in general).

I am not advocating that. I'm letting objects choose the best way to
define their interfaces. All I'm offering is a simple framework in which
resource lifetime is distinct from object lifetime.

3. If the user is writing the checks, I don't want to be forced to
   accept the programming and maintainance cost of representing and
   detecting the zombie state throughout my code. It amounts to
   defensive programming, and IMO it's much easier to write correct code
   without such checking in the way.

Since we already expend effort in ensuring there's no dangling pointers,
there's no extra effort needed.

4. You need to choose a defined behavior when the checks fail. I
   suppose aborting is OK, but most language designers are inclined to
   throw exceptions from these operations, a choice which is fraught
   with peril for reason-ability and correctness
   (http://groups.google.com/group/comp.lang.c++.moderated/msg/00b0361b15ae2d1e).

   Finally, if you choose to throw, then in order to make the default
   ctor nonthrowing you've just made all the other operations
   potentially throwing. That's a very bad tradeoff.

I don't need to choose a defined behavior when the checks fail, nor
there are any checks to start with. I think you attribute more
complexity to my scheme.

So, IMO, GC makes good sense as a debugging tool for C++, but not as a
core language feature.

That way you get to use GC for what it's best at (recycling memory)
and deterministic teardown for what it's best at (timely release of
resources). In fact I'd go as far as saying that I think separating
teardown from deallocation an essential ingredient in successfully
marrying deterministic teardown with garbage collection.

I think it's known and well-accepted by now that GC finalizers cannot in
general call destructors, which basically leads one directly to the same
conclusion.

Once we accept the reality of "destroyed but not deallocated" objects,
we need to define that state appropriately.

Not necessarily ;-). Leaving certain things undefined in the language
can have advantages, not least that the implementation is free to be
efficient rather than slow and "checked."

That state claims no more resources than the bits of Foo alone, and
must be detected by all member functions of the object such that they
have defined behavior for that state.

You're scaring me again. Sounds.... slooooow.

Most objects only use memory-only resources, so only a minority would
ever have a destructor. Those that need one usually manipulate resources
that are (almost by definition) expensive enough to manipulate to dwarf
any in-core overhead associated with managing resource lifetime.

d) Non-throwing default constructors occasionally facilitate writing
algorithms. For example, in Hoare's partition it's convenient to store
the pivot in a local temporary. The temporary is default-constructed and
then only swapped around. In other words, non-throwing default
constructors allow writing "conservative" algorithms on collections that
do not create any new value, but do use temporary memory to shuffle
values around.

Yes, it can be convenient, and aside from the fact that uninitialized
built-ins are also zombies ("extending the semantics of C to
user-defined types"), I understand that to be Stepanov's reason for
insisting on nonthrowing default-constructibility in Regular Types.

Again, my zombie isn't scary-looking. It's a default-initialized object.

However, default-constructibility is not a *huge* convenience in any
algorithm implementation I've seen, especially not partition, and I
value the correctness guarantees I can get from the compiler over this
convenience.

Try stable_sort.

There are disadvantages too, for example it becomes impossible to
distinguish a legit default-constructed object from one that has had a
life and kicked the bucket. For example in the mutex case it may make
sense to define a default-constructed mutex one on which initialization
did succeed, and a zombie one a mutex that is obliterated with an
illegal bit pattern. (A more portable solution would need to add a bool
to the object state.)

I'd be interesting in hearing further opinions on how garbage collection
interacts with defining object states.

IMO there the most obvious interactions have to do with peoples'
expectations. People in general expect to be able to forget about
object lifetime when they have GC, and you just can't, in general. If
you do forget about it completely, you undermine class authors who
reasonably expect their dtors to be called. If I write a generic
algorithm that leaks a vector<T> and T turns out to be mutex, the
program is broken.

As far as I can tell, you need to get the fact that an object manages
non-memory resources into the type system in order to fulfill peoples'
expectations to actually leak stuff, and the real opportunities to do so
will be far fewer than many people expect. We certainly don't have a
reliable programming model for doing it in C++ as defined today, which
is the main reason I opposed the recent proposals for adding GC to the
language.

This problem also crops up when you have true shared ownership.
You still need to know when to do teardown, so you need to duplicate the
reference counting that most people think is eliminated by GC.

In sum:

* I find the correctness benefits of a deterministic-teardown GC system
  with resourceless default construction to be dubious at best -- yes,
  there are runtime safety gains but they may be offset compile-time and
  code reasoning costs.

But this doesn't sum anything before it. You mostly mentioned efficiency
concerns throughout, so this "reasoning costs" now comes as a surprise.

* I fear that such a system will fail miserably to fulfill people's
  expectations for what a GC'd language can provide. If we're lucky
  that would lead to almost nobody taking advantage of the system. If
  we're unlucky, people will write code as though the language can
  fulfill their expectations, when really it can't. Think: exception
  specifications.

Exception specifications are bad anyway, aren't they :o).

Andrei

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]