Re: Call virtual function in constructor

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

comp.lang.c++

Date:

Mon, 18 Feb 2008 03:35:45 +0100

Message-ID:

<13rhro5ahlmpacd@corp.supernews.com>

* Pavel:

...

class FooConnection {
protected:
    virtual void init(const ConnectionParameters &pars) = 0;
public:
    FooConnection(const ConnectionParameters &pars) {
        init(pars);
        validateConnection();
    }
private:
    void validateConnection()
        throw(FooConnectionException /*defined elsewhere*/)
    {
        /* perform some uniform validation here, for example
            some select from "FOO_MAIN_TABLE" */
    }
};
class OracleFooConnection : public FooConnection {
protected:
    void init(const ConnectionParameters &pars) {
        // .. do Oracle-specific initialization
    }
};
class MySqlFooConnection : public FooConnection {
protected:
    void init(const ConnectionParameters & pars) {
        // .. do MySql-specific initialization
    }
};

...

I think you meant to write "It can hardly be argued that
initialization is only supposed to operate on initialized objects".

No, I meant what I said -- that initialization is what makes an object
initialized, so it, by design, is supposed operate on the objects that
are not yet initialized (ok, maybe not necessarily "only").

OK.

In C++ this is dealt with by constructors, which, as opposed to other
member functions, operate on not-yet-initialized objects.

This is not true. The Standard allows calling "other member functions"
from the constructor and these functions operate on not-yet-initialized
objects with pre-determined results -- even virtual functions -- unless
the virtual call "uses an explicit class member access". So, the
Standard is OK with the member functions operating on
not-yet-initialized object.

Well here you have misunderstood completely. The sentence you're
quoting from (in ?12.7/3) has to do with technically undefined behavior
for use of a possibly not-yet-constructed sub-object within the current
rules, where the example is multiple inheritance. First and most
important, technical UB is /not/ the problem we're discussing. Second,
even within that context, "uses an explicit class member access" is a
misleadingly incomplete quote, because what's important about that
sentence, with respect to UB (which is not what we're discussing), is
the qualification that the sentence continues with. C++ does absolutely
not, in general, forbid a constructor calling a virtual function that
uses an explicit member access. So when you write "This is not true"
it's a kind of Hofstadter'esque paradoxical self-reference... :-)

I'm amazed that you chose to interpret my paragraph as if I were unaware
of e.g. init-functions, given that the article you responded to referred
you to my own discussion of them.

I think choosing such a silly interpretation is a bit adversarial.

If had to rewrite my paragraph above so that it would withstand an
adverserial attack, then it would at the end repeat much of the standard
and add about just as much or more about rationales, plus just about as
much about general OO theory and engineering practice, properly peppered
with references to disclaimers "we don't know the standard's rationales"
every fifth character or so -- and it would also involve much
weasel-language and be totally unreadable and ungrokkable, which is
generally what results when one adopts the thinking mode of lawyers.

Suffice it to say, (1) constructors are the tools that C++ give you to
handle the problem of operating on uninitialized objects in order to
initialize them, and (2) in other member functions you can generally
assume that the object is initialized, that the class invariant has been
established.

The weasel language "generally" is for the case of e.g. an init-function
or other construction helper.

And the importance of (2) is not technical UB, although ignoring (2) can
in the end result in UB, but it is that other member functions in
general have much stronger assumptions, that those assumptions would
easily be violated if virtual calls down to derived classes were allowed
in constructors, and that this type safety aspect is the rationale
(insert disclaimer about knowing rationales) for the C++ rules.

E.g. in the words of FAQ item 23.5, "C++ is protecting you from serious
and subtle bugs", "if the above rule were different, you could easily
use objects before they were initialized, and that would cause no end of
grief and havoc", or, read Bjarne's discussion about class invariants,
at <url: http://www.research.att.com/~bs/3rd_safe0.html>.

What is not allowed is "referring to a nonstatic member before the
constructor begins execution" and that's what I would like to see
relaxed to allow at least an access to non-static member *functions*,
because, contrary to its name, constructor does not "construct" an
object in memory, but initializes it. Member functions do not require
initialization in constructor

They do. In most implementations, calling a virtual member function
requires that a proper vtable pointer has been established. And that's
the constructor's responsibility -- it just happens under the hood.

in fact, nothing of a member function can
be changed in the constructor; therefore, unless it reads, directly or
indirectly, some uninitialized *data* members, its call would do no harm.

I think what you mean is that you'd like the ability, some mechanism, to
call a virtual member function, from a class X constructor, with *this
treated as an object of the most derived type (a class derived from X),
if that member function's definition would have been legitimate in X and
ditto for all member functions that it calls directly or indirectly.

And I think that would be very hard to specify in detail (to enforce).

However, if a member function doesn't access any member data at all, and
only calls functions that don't access member data, then we're talking.

That would be the often wished for "static virtual", a member function
that can be called virtually but doesn't have /access/ to a 'this'
pointer. Except for the ability to be called virtually on derived class
from a constructor, it can be emulated by a pair of member functions,
namely one virtual member function that (only) calls a static member
function doing the work. It would be nice with special syntax for that.

One way of resolving what should happen if such a function itself calls
a "static virtual" function is that the member functions calls it makes
in turn will always be non-virtual.

The problem with Java's virtual calls from constructors can be
restated in these terms, that that mechanism does not deal with that
problem.

Specifically, that it causes member functions other than constructors
to operate on not-yet-initialized objects (or more precisely, for
Java, on objects that have not yet had their class invariants
established).

Just replace the word "causes" to "allows" and I will agree with the
facts in your statement.

"Allows" implies in-practice "causes". :-)

We're talking about practical implications.

As for your conclusion ("the problem"),
however, it may or may not be the problem in each particular case of
using it but it is definitely not the problem of the language. It is a
feature, sometimes useful (not very often but not extremely rarely,
either) and dangerous when misused at the same time.

Java's rules for virtual calls in constructors are language problem,
because (1) the problem can easily be prevented by suitable language
rules, such as in C++, and (2) without that type safety, the language
encourages the practice of non-type-safe design and coding.

What's wrong is the earlier "This way...", the virtual call (in Java
and some other languages) in the constructor invoking a function
implementation in a derived class.

See above

That is not necessary in order to keep all validation in the
constructor, nor is it necessary in order to ensure that client code
only has access to valid objects.

It is one way of making sure the client code always accesses
the valid object -- which is the "best practice" I referred to. I have
never stated it was the only way, so I do not think we have a
disagreement here.

I think I begin to understand why "final" classes are so popular in Java.

For if client code could derive from any such class, then the code would
not ensure that client code only had access to valid objects.

Deriving from a class using that non-type-safe idiom is a very easy way
to gain access to a non-initialized object.

As it happens that's also a problem with the init-function solution in
C++, e.g., as used by Microsoft's ATL library -- you can easily end up
with a call of a virtual function where the object isn't yet properly
initialized.

The difference is that with the C++ init-function the programmer has
intentionally refrained from using the proper language mechanism,
presumably in order to avoid its type safety (poor programmers often do
that, hey this thing doesn't let me do what I want to!), whereas with
the Java constructor's virtual call it is the language mechanism that
otherwise would be the proper one, that commits this novice error.

...

It is an anti-pattern.

I did not call the code above a pattern but "anti-pattern" seems
little "out of wack" to me :-). Why don't we try to refrain from
tagging or rubber-stamping each other's examples?

The above was a precise (well, OK, not that precise!) technical
description.

Coming from rural Northern Norway, you know, fishermen and such, I can
assure you that when I resort to name calling, you'll know it... :-).

See <url: http://en.wikipedia.org/wiki/Anti-pattern> for a general
introduction to antipatterns.

Well, I agree they give a reasonable definition. It is more or less in
line with direct GoFs definition of a pattern. According to Wikipedia,
to be an anti-pattern:

1. A pattern of actions must be "repeated" -- compare to GoF's involving
a solution for a "general design problem" in their problem definition.

2. It must "ultimately produce" the "bad consequences outweighing the
hoped-for advantages"

3. A refactored solution must be "clearly documented, proven in actual
practice and repeatable"

My problem does not fit a single bit of the above definition. It is:

1. Specific, just a case to address the Kira's question to the original
poster "why you would want to invoke a method that your object wishes to
override"

Java's virtual-call-from-constructor is, in your own words quoted above,
"not extremely rarely, either".

So yes, it is a repeated pattern.

So often repeated that evidently Java tools such as Eclipse can detect
that automatically.

2. Does not produce (in Java) or would not produce (in the hypothetical
C++ example) any bad consequences.

Ending up with a call of a virtual function on a not-yet-initialized
object is very common, and the abundance of bugs in Java programs
resulting from that really does count as bad consequences.

3. The suggested alternatives (including my own for C++) are worse than
the original course of actions. They add unnecessary complexity and do
not address an issue in the original solution (because, IMHO, there is
no issue).

If you deny that there is any issue, then of course the little
superficial complexity to avoid that issue seems unnecessary.

However, the complexity is inherent in the problem: glossing over by
using member variables for communication (hiding the communication) and
using unexpressed assumptions about what can be safely accessed (hiding
the uninitialized issue and order of operations issues), does not make
that inherent complexity go away, it's just a glossing over, hiding.

You here choose superficial simplicity over addressing underlying
problems and exposing actual, inherent complexity (even if there's not
much of it!). It's much like not doing unit-tests or not writing any
documentation whatsoever, as one person I talked with proudly explained
that his company did. It goes only to the zeroth level of perception,
less work right now for me, and therefore obviously less work in total.
It's like marrying a girl because one is infatuated with her glorious
cosmetics and delightful perfume, ignoring what comes after that.

Doing a Google search for a name of this particular antipattern didn't
turn up any hits.

However, since it is an antipattern it's called an antipattern here &
there on the net, e.g. <url:
http://mehranikoo.net/CS/archive/2006/11/28/InstanceConstructors.aspx>
and <url: http://debasishg.blogspot.com/2006_11_01_archive.html>
(which indicates the Eclipse can detect this antipattern automatically).

The first referenced article states that the "Template Method" pattern
becomes an anti-pattern if used in Constructors. I was far from stating
the opposite, my context is much more narrow -- how to re-factor the
constructor code to address the particular valid business requirement.
Once again, we are discussing a particular problem and whether or not
the tool (C++) is helpful enough to solve it.

Your second reference is from really afar field. It demonstrates how
Java aspects fire a thread that would access an incompletely constructed
object. Not sure how it is relevant -- the class constructor does not
have to call any virtual methods of its class to create such a problem.
Again, this demonstrates the misuse of a language feature -- explicit
thread support in Java. If C++ supported threads, same misuse would be
possible in C++. I hope nobody suggests to ban the Thread support due to
the possibility of this misuse (I admit this case is much more extreme
than in our case).

Both articles use the term antipattern for the general notion of virtual
call from constructor in Java. And presumably that's also what Eclipse
detects, not whether there is a template pattern or threads involved...

...

Perhaps you may find my original sketch for that item more clear,
<url:
http://home.no.net/alfps/cpp/faq_proposal/@virtual-functions.html#faq-20.7>

I read it, thank you. Your part-creator solution is probably the best to
to solve my sample case and is similar to my own alternative solution
(in both design and, unfortunately, the complexity)

It would go like (off the cuff)

class ConnectionFactory

...
Handle create( Params& const params ) const
Add one more class for Handle
...
...
class FooConnection
...
class OracleFooConnection : public FooConnection
...

class Factory: public ConnectionFactory

...
> It splits things up very nicely in terms of responsibility, the
> communication lines are very clear (as opposed to communication via
> member variables, which is almost the same as global variables), and
> there is no call of non-constructor function on uninitialized object.

You solution illustrates the point I am trying to make really well --
thank you, no irony here. We ended up with 5 non-trivial communicating
classes (ConnectionFactory, Handle, FooConnection, OracleFooConnection,
Factory), because our requirement was:

"I want to factor out some code that is common for all classes in my
class hierarchy and is supposed to be called *after* the class-specific
code when I initialize my objects"

If only our requirement had the word *before* in place of *after* above,
we would undoubtedly have to write only 2 classes (FooConnection and
OracleFooConnection) and the communication would be really trivial,
nothing to talk about.

Isn't it obvious that our tool of choice (C++) stands in our way in this
particular case?

On the contrary, it forces you to at least think about the problem and
choose some solution intentionally, instead of blindly doing the
equivalent of non-typed assembly language programming, very happy that
hey, the code "works". The above exposes the notions inherent in the
problem. I think that's much better than hiding them.

Unfortunately C++ does not force you or guide you to a good solution.

However, a special language mechanism for this would further complicate
an already quite complicated language.

Of course we can appease ourselves that we accomplished
more than just solving the original problem (implemented the Factory and
Handle "mini-frameworks" in your solution and implemented Factory
mini-framework and reduced the dependence of the client code on the
implementation in my solution -- I threw in some Bridge) but..

- who asked us to do all that?

The problem itself has all this in it.

- who is going to pay for all that (in money or project time) if we
don't need to re-use all that and it was not asked for?

On the contrary, who pays for the consequences of all of thouse
countless Java bugs resulting from virtual calls from constructors, and
for the bugs resulting from the general practice of not expressing
design or problem level types as types in the code?

There isn't much cost up-front for doing things properly. Those costs
(which for the virtual call thing itself amounts to three or four extra
lines) are negligible.

Your itemization of classes leads me to suspect that in your preferred
solution there wouldn't even be a class or type Handle, i.e. an as much
as possible un-typed solution, which means not expressing restrictions.
Not expressing restrictions means that the compiler can't help deduce
violations of such restrictions. That means more bugs and higher costs,
but it may of course not be blindingly obvious where they stem from.

- who is going to test all that if it was not required by the business
and pay for that, too?

Again, on the contrary: who's paying for the extra work involved in
testing code that doesn't express design level restrictions? With such
code proper testing must check that the design level restrictions aren't
violated. In practice that means complete coverage testing and still
only having a vague probability that the code might be OK.

- who is going to document all those clear communication lines and then
talk every newcomer to the team into following our "right ways"? They
may be right but they surely will not be most intuitive for him/her. And
then, s/he has to write a separate Factory for every new FooConnection
and not forget to create that Handle, not a connection itself..

Again, on the contrary, who is going to document the communication lines
in your code, with communication via member variables (effectively about
the same as communication via global variables)? I'm pretty sure that
these communication lines, ording issues and responsibilities are /not/
documented at all, but if they are, then that documentation must of
necessity be much more verbose and detailed than for the case where it's
expressed directly in the code, and then it amounts to a non-enforcable
comment, instead of as with proper design, enforced by compiler. I.e.,
you're here requiring a much higher standard of documentation for the
clear code where that documentation isn't needed, than for the
hide-the-issues code where the documentation is very much needed.

Long story short, is this ban of our little language feature (which we
would know how to use safely) worth the trouble?

There is no trouble with the C++ rules, as far as I can see.

The trouble is with not enforcing type safety, as in Java and some other
languages.

And the up-front cost of doing things properly is negligible.

Cheers, & hth.,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?