Re: Call virtual function in constructor

From:

Pavel <dot_com_yahoo@paultolk_reverse.yourself>

Newsgroups:

comp.lang.c++

Date:

Mon, 18 Feb 2008 06:45:25 GMT

Message-ID:

<9Q9uj.234876$MJ6.59758@bgtnsc05-news.ops.worldnet.att.net>

In C++ this is dealt with by constructors, which, as opposed to other
member functions, operate on not-yet-initialized objects.

This is not true. The Standard allows calling "other member functions"
from the constructor and these functions operate on
not-yet-initialized objects with pre-determined results -- even
virtual functions -- unless the virtual call "uses an explicit class
member access". So, the Standard is OK with the member functions
operating on not-yet-initialized object.

I think choosing such a silly interpretation is a bit adversarial.

Again, no pun was intended. I simply do not know how else to interpret
your "constructors, as opposed to other member functions, operate on
not-yet-initialized objects". If a constructor calls another member
function why can not that function operate on "not-yet initialized
object"? Does not it make your statement not true?

I am deleting most of your other post because I agree to it -- simply do
not understand how it is related to our subject.

E.g. in the words of FAQ item 23.5, "C++ is protecting you from serious
and subtle bugs", "if the above rule were different, you could easily
use objects before they were initialized, and that would cause no end of
grief and havoc", or, read Bjarne's discussion about class invariants,
at <url: http://www.research.att.com/~bs/3rd_safe0.html>.

Well, please agree there are much easier ways to "cause no end of grief
and havoc" in C++. And it was Bjarne's promise about the language's
allowing blowing off my whole leg if I try a little harder.

On a more serious note, the (vaguely) relevant part of the text in your
reference (I believe you meant
http://www.research.att.com/~bs/3rd_safe.pdf, did not you?) states that
it is the responsibility of the constructor to either establish the
invariants or throw an exception if the invariant cannot be established.
I could not agree more and that's exactly what my validate() method
(called from the constructor) does. I hope you do not understand this
idiom as literally establishing invariant in the constructor code, not
being able to break down the constructor's logic onto smaller functions
even if establishing the invariant would take 10000 lines of source
code. Your solution delegates these responsibilities to doCreate(). By
the way, to do anything useful in doCreate(), your Factory must be a
friend of a Handle or something similar -- not too clear a line..)

Frankly, if you agree that it is ok for a constructor to call other
member functions (virtual or not) for the purpose of initialization of
some or all of the object's data members, we do not have a disagreement
here -- otherwise, please suggest an alternative.

What is not allowed is "referring to a nonstatic member before the
constructor begins execution" and that's what I would like to see
relaxed to allow at least an access to non-static member *functions*,
because, contrary to its name, constructor does not "construct" an
object in memory, but initializes it. Member functions do not require
initialization in constructor

They do. In most implementations, calling a virtual member function
requires that a proper vtable pointer has been established. And that's
the constructor's responsibility -- it just happens under the hood.

Member functions do *not* require it. That some implementations choose
to initialize vtable this way does not constitute the need to do it
(except maybe for that very rule of selecting which virtual function is
called in constructor that we discuss and about which I am making my
point that this rule is not useful). Compiler has complete knowledge of
the class hierarchy of an object including its most derived type at the
point in the code where the object is created so the complier has all
necessary information to insert the code constructing the object's
storage layout in general and vtables (if needed) in particular before
calling the first base class's constructor. I think I am repeating
myself though.

I think what you mean is that you'd like the ability, some mechanism, to
call a virtual member function, from a class X constructor, with *this
treated as an object of the most derived type (a class derived from X),
if that member function's definition would have been legitimate in X and
ditto for all member functions that it calls directly or indirectly.

True.

And I think that would be very hard to specify in detail (to enforce).

Java does it. Not difficult at all: first build the object's storage
layout completely, then enter the code for the very first constructor.
As for the requirement language, I am as unwilling to cite pages of Java
Language Specification as you -- the pages of C++ Standard. Enough to
say, it has been done before, with little troubles.

However, if a member function doesn't access any member data at all, and
only calls functions that don't access member data, then we're talking.

No, not that I meant although it is a useful concept that you mention,
but not for this example. In this example, the member function must
access members (namely to initialize them) -- it does not have to read
them though, only write... I am deleting some more of your post related
to the above.

Specifically, that it causes member functions other than constructors
to operate on not-yet-initialized objects (or more precisely, for
Java, on objects that have not yet had their class invariants
established).

Just replace the word "causes" to "allows" and I will agree with the
facts in your statement.

"Allows" implies in-practice "causes". :-)

We're talking about practical implications.

Again you try to make me look defending some "patterns" or similar
general concept. I am just trying to solve a particular problem whose
solution is clearly overcomplicated by the language's rules. No, in my
pseudo-code there is nothing wrong: the operation on not-yet-initialized
objects is correct, namely, it makes them already-initialized :-).

As for your conclusion ("the problem"), however, it may or may not be
the problem in each particular case of using it but it is definitely
not the problem of the language. It is a feature, sometimes useful
(not very often but not extremely rarely, either) and dangerous when
misused at the same time.

Java's rules for virtual calls in constructors are language problem,
because (1) the problem can easily be prevented by suitable language
rules, such as in C++, and (2) without that type safety, the language
encourages the practice of non-type-safe design and coding.

What do reinterpret_cast and pointer arithmetic encourage :-)? Let's be
serious, Java type safety is enforced (as opposed to the one by C++).
Let's do not make assumptions either: this is the rule that you call
reasonable makes me write 5 classes instead of 2 (in my solution,
actually, 6 :-( -- you are better in that, but I achieve better
readability and some side advantages, imho. I will paste my solution at
the end of this post to avoid being pointless).

What's wrong is the earlier "This way...", the virtual call (in Java
and some other languages) in the constructor invoking a function
implementation in a derived class.

See above

That is not necessary in order to keep all validation in the
constructor, nor is it necessary in order to ensure that client code
only has access to valid objects.

It is one way of making sure the client code always accesses
the valid object -- which is the "best practice" I referred to. I have
never stated it was the only way, so I do not think we have a
disagreement here.

I think I begin to understand why "final" classes are so popular in Java.

Are they? I do not meet them often.

For if client code could derive from any such class, then the code would
not ensure that client code only had access to valid objects.

Well, this is one effect that is achieved by `final'. Also, the compiler
is free to inline final function calls (in the extreme case of final
class -- all of them if it feels like it. C++ compiler does not have
this luxury when one member function calls another, virtual member
function of the same class). And certainly there are other uses.

Deriving from a class using that non-type-safe idiom is a very easy way
to gain access to a non-initialized object.

Doesn't any member function of a class called from its constructor have
such access?

As it happens that's also a problem with the init-function solution in
C++, e.g., as used by Microsoft's ATL library -- you can easily end up
with a call of a virtual function where the object isn't yet properly
initialized.

The difference is that with the C++ init-function the programmer has
intentionally refrained from using the proper language mechanism,
presumably in order to avoid its type safety (poor programmers often do
that, hey this thing doesn't let me do what I want to!), whereas with
the Java constructor's virtual call it is the language mechanism that
otherwise would be the proper one, that commits this novice error.

Again, I am trying to get away from generalizing. In my example, the
programmer just wants to factor its initialization code according to its
responsibilities -- and has a problem doing that.

It is an anti-pattern.

I did not call the code above a pattern but "anti-pattern" seems
little "out of wack" to me :-). Why don't we try to refrain from
tagging or rubber-stamping each other's examples?

The above was a precise (well, OK, not that precise!) technical
description.

Coming from rural Northern Norway, you know, fishermen and such, I
can assure you that when I resort to name calling, you'll know it...
:-).

See <url: http://en.wikipedia.org/wiki/Anti-pattern> for a general
introduction to antipatterns.

Well, I agree they give a reasonable definition. It is more or less in
line with direct GoFs definition of a pattern. According to Wikipedia,
to be an anti-pattern:

1. A pattern of actions must be "repeated" -- compare to GoF's
involving a solution for a "general design problem" in their problem
definition.

2. It must "ultimately produce" the "bad consequences outweighing the
hoped-for advantages"

3. A refactored solution must be "clearly documented, proven in actual
practice and repeatable"

My problem does not fit a single bit of the above definition. It is:

1. Specific, just a case to address the Kira's question to the
original poster "why you would want to invoke a method that your
object wishes to override"

Java's virtual-call-from-constructor is, in your own words quoted above,
"not extremely rarely, either".

So yes, it is a repeated pattern.

No, I am saying "the language feature is sometimes useful, not extremely
rarely". A language feature is not a pattern -- it is like a hammer. It
can build a palace if used in nice "nail driving" pattern (course of
actions) or it can kill if used in the ugly "murdering" anti-pattern.
But the hammer is not a pattern or course of action by itself.

So often repeated that evidently Java tools such as Eclipse can detect
that automatically.

not really often. There are historical reasons why Java crowd is so loud
about this particular issue but the discussion would take us too far away.

2. Does not produce (in Java) or would not produce (in the
hypothetical C++ example) any bad consequences.

Ending up with a call of a virtual function on a not-yet-initialized
object is very common, and the abundance of bugs in Java programs
resulting from that really does count as bad consequences.

Please.. I find it really ironic that I am put in the position where I
almost started defending Java safety vs C++. It would start another holy
war and Java would prevail (and no, I am not a Java adept, I started
with Fortran 66 and I can always prove it is safer and in general beats
any other PL hands down :-) ).

But please please please do not make me responsible for all bad Java (or
C++, for this matter) code around. Just see the hammer metaphor above.

3. The suggested alternatives (including my own for C++) are worse
than the original course of actions. They add unnecessary complexity
and do not address an issue in the original solution (because, IMHO,
there is no issue).

If you deny that there is any issue, then of course the little
superficial complexity to avoid that issue seems unnecessary.

However, the complexity is inherent in the problem: glossing over by
using member variables for communication (hiding the communication) and
using unexpressed assumptions about what can be safely accessed (hiding
the uninitialized issue and order of operations issues), does not make
that inherent complexity go away, it's just a glossing over, hiding.

I am probably blind but I do not see this complexity -- and usually I am
quite good at it. All is required is to execute some common code every
time right after class-specific part of initialization. How complex can
that be?

You here choose superficial simplicity over addressing underlying
problems and exposing actual, inherent complexity (even if there's not
much of it!).

there is none in Java solution. You attack the valid solution from the
position that the language feature it uses is misused by some other
code. Every non-trivial C++ feature is misused by some code floating
around (and maybe every trivial one, too). What does it prove to you?
Not that all non-trivial C++ features features should be eliminated, I hope.

It's much like not doing unit-tests or not writing any

documentation whatsoever, as one person I talked with proudly explained
that his company did. It goes only to the zeroth level of perception,
less work right now for me, and therefore obviously less work in total.

....
Other way around. I anticipate (from the experience, not just pure
logic) how many times the people will not read my documentation, how
many times they will abuse every line of the code they get to support,
especially if they did not write it, especially if the code does not
seem to be related to the problem in hand, how many times they will
stumble upon the non-obvious code, will not ask questions but instead
start re-inventing their own wheels or assume etc. etc. etc.. Extra code
== extra liability, believe me. After your project is 2-3 years old you
will see the ugly remnants of 24 different "frameworks" and "right ways"
in the code, of which half does one same thing in 12 different ways and
another half does the second thing.

Doing a Google search for a name of this particular antipattern
didn't turn up any hits.

However, since it is an antipattern it's called an antipattern here &
there on the net, e.g. <url:
http://mehranikoo.net/CS/archive/2006/11/28/InstanceConstructors.aspx>
and <url: http://debasishg.blogspot.com/2006_11_01_archive.html>
(which indicates the Eclipse can detect this antipattern automatically).

The first referenced article states that the "Template Method" pattern
becomes an anti-pattern if used in Constructors. I was far from
stating the opposite, my context is much more narrow -- how to
re-factor the constructor code to address the particular valid
business requirement. Once again, we are discussing a particular
problem and whether or not the tool (C++) is helpful enough to solve it.

Your second reference is from really afar field. It demonstrates how
Java aspects fire a thread that would access an incompletely
constructed object. Not sure how it is relevant -- the class
constructor does not have to call any virtual methods of its class to
create such a problem. Again, this demonstrates the misuse of a
language feature -- explicit thread support in Java. If C++ supported
threads, same misuse would be possible in C++. I hope nobody suggests
to ban the Thread support due to the possibility of this misuse (I
admit this case is much more extreme than in our case).

Both articles use the term antipattern for the general notion of virtual
call from constructor in Java.

BTW, the second article is not even about Java at all (not sure why the
author says "Java") -- see the code snippet -- it is some
aspect-oriented JVM-based language (AspectJ, maybe). But if you insist
on using the pattern terminology, why don't you start with the
rationale? My little example is far from the rationales behind the
[anti]patterns in those cited articles; and what can be an anti-pattern
for a generic problem, can be a valid solution (not a pattern, though)
for another specific problem.

And presumably that's also what Eclipse

detects, not whether there is a template pattern or threads involved...

I am unsure, too. Would be interested to know though.

Perhaps you may find my original sketch for that item more clear,
<url:
http://home.no.net/alfps/cpp/faq_proposal/@virtual-functions.html#faq-20.7>

I read it, thank you. Your part-creator solution is probably the best
to to solve my sample case and is similar to my own alternative
solution (in both design and, unfortunately, the complexity)

It would go like (off the cuff)

class ConnectionFactory

...
Handle create( Params& const params ) const
Add one more class for Handle
...
...
class FooConnection
...
class OracleFooConnection : public FooConnection
...

class Factory: public ConnectionFactory

...
> It splits things up very nicely in terms of responsibility, the
> communication lines are very clear (as opposed to communication via
> member variables, which is almost the same as global variables), and
> there is no call of non-constructor function on uninitialized object.

You solution illustrates the point I am trying to make really well --
thank you, no irony here. We ended up with 5 non-trivial communicating
classes (ConnectionFactory, Handle, FooConnection,
OracleFooConnection, Factory), because our requirement was:

"I want to factor out some code that is common for all classes in my
class hierarchy and is supposed to be called *after* the
class-specific code when I initialize my objects"

If only our requirement had the word *before* in place of *after*
above, we would undoubtedly have to write only 2 classes
(FooConnection and OracleFooConnection) and the communication would be
really trivial, nothing to talk about.

Isn't it obvious that our tool of choice (C++) stands in our way in
this particular case?

On the contrary, it forces you to at least think about the problem

Agree. And much more than absolutely necessary at that.. System 360
Assembler or Turing machine would make me think even yet more.. about
things irrelevant to my business requirements.. what good would it do me?

and

choose some solution intentionally,

> instead of blindly doing the

equivalent of non-typed assembly language programming,

any specifics?

very happy that

hey, the code "works". The above exposes the notions inherent in the
problem. I think that's much better than hiding them.

IMHO, Handle and Factory are equally inherent (or not) to any problem
involving any object creation/operation. I cannot see how they are more
inherent to this particular problem.

Unfortunately C++ does not force you or guide you to a good solution.

However, a special language mechanism for this would further complicate
an already quite complicated language.

Well, how a virtual function call in constructor is routed seems generic
enough mechanism to me. The feature is generic, it is the problem is
specific. As for the relative complexity of 2 approaches, I doubt we can
agree here because we, in fact, throwing opinions, not measuring. IMHO,
the current C++ behavior in that is more complicated and error-prone
than the one of Java, honestly, but I do not even try to change your
viewpoint on this. Unless we agree on some methodology of comparison,
there is no chance I will change my, either.

Of course we can appease ourselves that we accomplished more than just
solving the original problem (implemented the Factory and Handle
"mini-frameworks" in your solution and implemented Factory
mini-framework and reduced the dependence of the client code on the
implementation in my solution -- I threw in some Bridge) but..

- who asked us to do all that?

The problem itself has all this in it.

The problem of uniform validation of a freshly constructed object has
Factory and Handle in it? Can you be a little more specific and explain
where they were hiding?

- who is going to pay for all that (in money or project time) if we
don't need to re-use all that and it was not asked for?

On the contrary, who pays for the consequences of all of thouse
countless Java bugs

Not our imagined client of our OracleFooConnection -- we did not put in
those bugs, did we? Again, let's stop those groundless "countless" etc
-- these are just opinions. Objective representative statistics on bugs
are expensive to receive and interpret -- but I doubt C++ code has less
bugs to offer. Why don't we stick to the point? Fixing the world is not
an enterprise I would invest into.

resulting from virtual calls from constructors, and

for the bugs resulting from the general practice of not expressing
design or problem level types as types in the code?

There isn't much cost up-front for doing things properly. Those costs
(which for the virtual call thing itself amounts to three or four extra
lines) are negligible.

see above about upfront cost (after my "Other way around"). Maintenance
is the key for me -- but, BTW, it is usually more difficult to get a
budget for the upfront costs so they are "costlier" per dollar or per
hour, so to say.

Your itemization of classes leads me to suspect that in your preferred
solution there wouldn't even be a class or type Handle, i.e. an as much
as possible un-typed solution, which means not expressing restrictions.

I am confused again. I would not use Handle, true. Un-typed solution? My
"not-C++" solution was in my first post and my real solution (working in
C++ as it is now) is at the bottom of this post. I am not sure how any
of them is untyped. I feel the second one is even "over-typed" a little.
But it does not use Handle, what's true is true.

Not expressing restrictions means that the compiler can't help deduce
violations of such restrictions. That means more bugs and higher costs,
but it may of course not be blindingly obvious where they stem from.

Agree but again, how is it relevant? What specific restrictions do you
mean that 2-class solution would break and 5-class solution does not?

- who is going to test all that if it was not required by the business
and pay for that, too?

Again, on the contrary: who's paying for the extra work involved in
testing code that doesn't express design level restrictions? With such
code proper testing must check that the design level restrictions aren't
violated. In practice that means complete coverage testing and still
only having a vague probability that the code might be OK.

Can you please enumerate the restrictions you mean in our case? I beg
you do be more specific, it will spare both of us a lot of time and effort.

- who is going to document all those clear communication lines and
then talk every newcomer to the team into following our "right ways"?
They may be right but they surely will not be most intuitive for
him/her. And then, s/he has to write a separate Factory for every new
FooConnection and not forget to create that Handle, not a connection
itself..

Again, on the contrary, who is going to document the communication lines
in your code, with communication via member variables (effectively about
the same as communication via global variables)? I'm pretty sure that
these communication lines, ording issues and responsibilities are /not/
documented at all, but if they are, then that documentation must of
necessity be much more verbose and detailed than for the case where it's
expressed directly in the code, and then it amounts to a non-enforcable
comment, instead of as with proper design, enforced by compiler. I.e.,
you're here requiring a much higher standard of documentation for the
clear code where that documentation isn't needed, than for the
hide-the-issues code where the documentation is very much needed.

Again, please could you be more specific? What code do we refer to? What
member variables?

Long story short, is this ban of our little language feature (which we
would know how to use safely) worth the trouble?

There is no trouble with the C++ rules, as far as I can see.

I hoped I was able to demonstrate the trouble (not of C++ but of its
user) above; it is the necessity to write a lot of code tangential to
the problem in hand, which could be avoided under more logical (IMHO
only) and broadly used in practice (Java) rules.

The trouble is with not enforcing type safety, as in Java and some other
languages.

no comments. If I did not just delete what I wrote in response on a
whim, I would be responsible for inflaming another holy war. Please
let's stop comparing merits of different languages. I am guilty in
bringing in Java but I did it only as a working example of how objects
could be built differently in C++, not to praise Java or attack C++.

And the up-front cost of doing things properly is negligible.

We both are repeating ourselves, aren't we? See above..

The solution "in current C++" I promised earlier (6 classes, you do not
have to count. My "communication lines" seem to be a little clearer than
yours but I would first agree it is an opinion. I would gladly give up
all those lines for more straightforward mapping from the business to
the implementation language, as in the former "would-be C++" example):

// ---------- public code (supplied with the binary distribution)
typedef int Param; // just for example; some data type
typedef int FooException; // just for example; some exception type
struct FooConnectionImpl;
struct FooConnectionImplFactory;
class FooConnection {
public:
    virtual void doSomethingUseful() = 0;
protected:
    void validate() const throw(FooException &);
    virtual ~FooConnection();
    FooConnection(const Param &, const FooConnectionImplFactory *);
    FooConnectionImpl *impl;
};

class OracleFooConnection : public FooConnection {
public:
    void doSomethingUseful(); // at last..
public:
    OracleFooConnection(const Param &);
};

// --------- client code
int main(int argc, char *argv[]) {
    OracleFooConnection c(25);
    c.doSomethingUseful();
    return 0;
}

// --------- private code (the client's compiler does not need to see
this code)
#include <iostream>
using namespace std; // just for the sake of brefity, don't shoot
struct FooConnectionImpl { // the parallel hierarchy root
    virtual ~FooConnectionImpl() {}
};

void FooConnection::validate() const throw(FooException &) {
    // SELECT foo_version from FOO_MAIN_TABLE here for example
    // or some other common code. In practice will use FooConnectionImpl
interface.
}

FooConnection::~FooConnection() { delete impl; }

struct FooConnectionImplFactory {
    virtual FooConnectionImpl* create(const Param &) const = 0;
};

FooConnection::FooConnection(const Param &param, const
FooConnectionImplFactory *factory) {
    impl = factory->create(param);
}

void OracleFooConnection::doSomethingUseful() {
    cout << "do something useful already\n";
}

struct OracleFooConnectionImpl : public FooConnectionImpl
{
    OracleFooConnectionImpl(const Param &p) { /*...*/ }
    /*Oracle-specific stuff here*/
};

struct OracleFooConnectionImplFactory : public FooConnectionImplFactory
{
    FooConnectionImpl *create(const Param &p) const
      { return new OracleFooConnectionImpl(p); }
    static OracleFooConnectionImplFactory instance; // just a shortcut
here, do not want to go all the way
}; // w. real singletons etc

OracleFooConnectionImplFactory OracleFooConnectionImplFactory::instance;

OracleFooConnection::OracleFooConnection(const Param &p)
        : FooConnection(p, &OracleFooConnectionImplFactory::instance)
{ /* Oracle-specific initialization here */ }

Cheers, & hth.,

- Alf

Regards,
-Pavel