Re: The C++ Object Model: Good? Bad? Ugly?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sun, 9 Nov 2008 01:45:24 -0800 (PST)

Message-ID:

<01e16d1a-8911-450f-88df-7404f9d86e20@v16g2000prc.googlegroups.com>

On Nov 8, 1:08 pm, tonytech08 <tonytec...@gmail.com> wrote:

On Nov 8, 3:12 am, James Kanze <james.ka...@gmail.com> wrote:

On Nov 7, 10:20 pm, tonytech08 <tonytec...@gmail.com> wrote:

On Nov 7, 4:40 am, James Kanze <james.ka...@gmail.com> wrote:

On Nov 6, 11:29 pm, tonytech08 <tonytec...@gmail.com> wrote:

[...]

Thanks for reiterating my thought: C++ has more support
for OO with "full OO type objects".

More support than what.

More support for OO with "heavyweight" classes than for
POD classes.

You're not making sense. How does C++ have more support for
OO than for other idioms?

Why are you asking that when I said nothing of the sort? I
said that once you put a vptr into the data portion of an
object (for example), it's a different animal than a class
without a vptr (for example!). I distinguished these
fundamentally different animals by calling them "heavyweight"
and "lightweight" classes/object (and apparently wrongly
POD-classes wrongly).

Well, OO is often used to signify the presence of polymorphism,
and the only time you'll get a vptr is if the class is
polymorphic.

Moreso, I was concerned that other things one can do with a
class, such as defining overloaded constructors, may make code
fragile against some future or other current implementation of
the language. Who's to say (not me) that someone won't make a
compiler that tacks on or into a class object some other
"hidden" ptr or something to implement "overloaded
constructors"? I don't care if code is generated but I do care
if the compiler starts aberrating the data portion.

Well, that's C++. And C. And Fortran, and just about every
other language I'm aware of. The only language I know which
specifies the exact format of any types is Java, and it only
does so for the built-in types.

So what's your point. Data layout is implementation defined,
period. That was the case in C, and C++ didn't introduce any
additional restrictions.

C++ has support for "full OO type objects", if that's what
you need. Most of my objects aren't "full OO type objects",
in the sense that they don't support polymorphism. C++
supports them just as well.

I think I may be OK without polymorphism in "lightweight"
classes, but overloaded constructors sure would be nice. And
conversion operators. Can a POD class derive from a pure
abstract base class? That would be nice also if not.

And C++ supports all of that.

But am I guaranteed that my a class will stay lightweight if I
do that or is it implementation defined?

Everything is implementation defined. C++ inherits this from C,
and it was pretty much standard practice at the time C was
invented.

[...]

It's just an abstract way of looking at it. It's hardly a
stretch either, since the C++ object model or at least
most implementations use that as the foundation upon which
to implement polymorphism: tacking a vptr onto "the thing
part" (noun) of "the object".

C++ supports dynamic typing, if that's what you mean. In
other words, the type of an object vary at runtime. But I
don't see your point. It is the designer of the class who
decides whether to use dynamic typing or not. The language
doesn't impose it.

It imposes "a penalty" the second you introduce the vptr. The
class becomes fundamentally and categorically different in a
major way. (Read: turns a lightweight class into a
heavyweight one).

You're throwing around meaningless adjectives again. The
compiler has to implement dynamic typing somehow. You don't pay
for it unless you use it, and using a vptr is about the cheapest
implementation known.

[...]

I agree that there are other hindrances to having an elegant
programming model. Sigh. That's not to say that one can't get
around them to a large degree. (Not the least of which is:
define your platform as narrowly as possible).

That's a route C and C++ intentionally don't take. If there
exists a platform on which the language is not implementable,
it's pretty much considered a defect in the language.

I wasn't trying to be implementation literal about it.
Yes, data+behavior= class, but when the implementation
starts adding things to the data portion, that defines a
different animal than a POD class.

But the implementation *always* adds things to the data
portion, or controls how the data portion is interpreted.
It defines a sign bit in an int, for example (but not in an
unsigned int). If you want to support signed arithmetic,
then you need some way of representing the sign. If you
want to support polymorphism, then you need some way of
representing the type. I don't see your point. (The point
of POD, in the standard, is C compatibility; anything in a
POD will be interpretable by a C compiler, and will be
interpreted in the same way as in C++.)

Well maybe I'm breaking new ground then in suggesting that
there should be a duality in the definition of what a class
object is. There are "heavyweight" classes and "lightweight"
ones.

There's no strict binary division.

A class with a vptr is fundamentally different than one
without, for example.

And a class with private data members is fundamentally different
from one with public data members. And a class with user
defined constructors is fundamentally different from one
without.

There are a number of different classifications possible

The only ones I'm considering in this thread's topic though is
the lightweight/heavyweight ones.

Without defining it or showing its relevance to anything at all.

I use C++ with that paradigm today, but it could be more
effective if there was more support for "object-ness" with
"lightweight" classes.

Again: what support do you want? You've yet to point out
anything that isn't supported in C++.

(Deriving from interface classes and maintaining the size of
the implementation (derived) class would be nice (but maybe
impossible?)).

Not necessarily impossible, but it would make the cost of
resolving a virtual function call significantly higher. And
what does it buy you? You say it would be "nice", but you don't
explain why; I don't see any real advantage.

I am just trying to understand where the line of demarcation
is between lightweight and heavyweight classes is and how that
can potentially change in the future and hence break code.

There is no line of demarcation because there isn't really such
a distinction. It's whatever you want it to mean, which puts it
where ever you want.

The limitation appears to be backward compatibity with C.
If so, maybe there should be structs, lightweight classes,
heavyweight classes.

And maybe there should be value types and entity types. Or
maybe some other classification is relevant to your
application. The particularity of C++ is that it lets you
choose. The designer is free to develop the categories he
wants. (If I'm not mistaken, in some circles, these type of
categories are called stereotypes.)

I'm only talking about the two categories based upon the C++
mechanisms that change the data portion of the object.

Which in turn depends on the implementation, just as it did in
C.

Why do you care about the layout of the data anyway? There's
nothing you can really do with it.

Deriving a simple struct from a pure abstract base class will
get you a beast that is the size of the struct plus the size
of a vptr. IOW: an aberrated struct or heavyweight object.
Call it what you want, it's still fundamentally different.

A C style struct is different from a polymorphic class, yes.
Otherwise, there wouldn't be any point in having polymorphic
classes. The difference isn't any more fundamental than making
the data members private would be, however, or providing a user
defined constructor; in fact, I'd say that both of those were
even more fundamental differences.

[...]

The change occurs when you do something to a POD
("lightweight") class that turns the data portion of the
class into something else than just a data struct, as when
a vptr is added. Hence then, you have 2 distinct types of
class objects that are dictated by the implementation of
the C++ object model.

The concept of a POD was introduced mainly for reasons of
interfacing with C. Forget it for the moment. You have as
many types of class objects as the designer wishes. If you
want just a data struct, fine; I use them from time to time
(and they aren't necessarily POD's---it's not rare for my
data struct's to contain an std::string). If you want
polymorphism, that's fine too. If you want something in
between, say a value type with deep copy semantics, no
problem.

There is NO restriction in C++ with regards to what you can
do.

Yes there is if you don't want the size of your struct to be
it's size plus the size of a vptr. If maintaining that size is
what you want, then you can't have polymophism. Hence,
restriction.

What on earth are you talking about. C++ doesn't guarantee the
size of anything. (Nor does any other language.) If you need
added behavior which requires additional memory, then you need
added behavior which requires additional memory. That's not C++
talking; that's just physical reality.

You seem to be saying that POD classes are not supported
or at least not encouraged.

Where do I say that? POD classes are definitely supported,
and are very useful in certain contexts. They aren't
appropriate for what most people would understand by OO, but
so what. Not everything has to be rigorously OO.

You seemed to imply that the "supported" ("ecouraged" would
probably be a better word to use) paradigms were: A. data
structs with non- trivial member functions and built-in
"behavior" and B. "full OO type objects".

Not at all. You define what you need.

There are the limitations though: you can't have overloaded
constructors, for example, without losing POD-ness.

Obviously, given the particular role of PODs. So? What's your
point?

My point is that I'm worried about defining some overloaded
constructors and then finding (now or in the future) that my
class object is not "struct-like" anymore (read, has some
bizarre representation in memory).

I'm not sure what you mean by "struct-like" or "some bizarre
representation in memory". The representation is whatever the
implementation decides it to be. Both in C and in C++. I've
had the representation of a long change when upgrading a C
compiler. On the platforms I generally work on, the
representation of a pointer depends on compiler options. And on
*ALL* of the platforms I'm familiar with, the layout of a struct
depends on compiler options, both in C and in C++.

The whole point of using a high level language, like C++ (or C,
or even Fortran) is that you're isolated from this
representation.

Or derivation from "interfaces" (?).

How is a C program going to deal with derivation? For that
matter, an interface supposes virtual functions and dynamic
typing; it's conceptually impossible to create a dynamically
typed object without executing some code.

Code generation/execution is not what I'm worried about.

There's not much point in defining something that can't be
implemented.

[...]

I'm not sure what you mean by "the data portion to remain
intact".

Derive a class and you have compiler baggage attached to the
data portion.

Or you don't. Even without derivation, you've got "compiler
baggage" attached to the data portion. Both in C and in C++.

If I ever instantiate a class object that has overloaded
constructors and find that the size of the object is different
from the expected size of all the data members (please don't
bring up padding and alignment etc), I'm going to be unhappy.

Be unhappy. First, there is no "expected" size. The size of an
object varies from implementation to implementation, and depends
on compiler version and options within an implementation. And
second, I've yet to find anything to be gained by changing this.

Taken literally, the data portion had better remain intact
for all types of objects. If you mean contiguous, that's a
different issue: not even POD's are guaranteed to have
contiguous data (since C doesn't guarantee it)---on many
machines (e.g. Sparcs, IBM mainframes...) that would
introduce totally unacceptable performance costs.

If a platform is so brain-damaged that I can't do things to
have a high degree of confidence that the size of a struct is
what I expect it to be, then I won't be targeting that
platform. Other people can program "the exotics".

So what do you expect it to be? You can't expect anything,
reasonable.

If anything, C++ specifies the structure of the data too
much. A compiler is not allowed to reorder data if there is
no intervening change of access, for example. If a
programmer writes:

    struct S
    {
        char c1 ;
        int i1 ;
        char c2 ;
        int i2 ;
    } ;

for example, the compiler is not allowed to place the i1 and
i2 elements in front of c1 and c2, despite the fact that
this would improve memory use and optimization.

And I think I have control over most of those things on a
given platform. Which is all fine with me, as long as I HAVE
that control (via compiler pragmas or switches or careful
coding or whatever).

And compilers have alway been free (and always will be free) to
provide such controls. I've never found the slightest use for
them, but they're there. The standard intentionally doesn't
specify how to invoke the compiler, or what pragmas are
available, to achieve this, since there's nothing you can really
say which would make sense for all possible platforms.

Anything else would be a contradiction: are you saying you
want to provide a constructor for a class, but that it won't
be called?

Of course I want it to be called. By "POD-ness" I just meant I
want a struct-like consistency of the object data (with no
addition such as a vptr, for example).

I don't understand all this business of vptr. Do you want
polymorphism, or not.

Yes, but without the vptr please (coffee without cream
please).

You mean caf=E9 au lait without any milk. If you have
polymorphism, it has to be implemented.

If you want polymorphism, the compiler must memorize the
type of the object (each object) somewhere, when the object
is created; C++ doesn't require it to be in the object
itself, but in practice, this is by far the most effective
solution.

But what if just pure ABC derived classes were handled
differently? Then maybe the situation would be less bad.

Propose a solution. If you want polymorphism, the compiler must
maintain information about the dynamic type somehow. Whether
the base class is abstract or not doesn't change anything; an
object must somehow contain additional information. Additional
information means additional bits, which have to be stored
somewhere. The only alternative to storing them in the object
itself is somehow being able to recover them from the address of
the object. A solution which would seem off hand considerably
more expensive in terms of run-time. For practically no
advantage in return.

[...]

Well there's another example then of heavyweightness: sprinkle
in "public" and "private" in the wrong places and the compiler
may reorder data members. (I had a feeling there was more than
the vptr example).

Yes, because C compatibility is no longer involved. It is
fairly clear that C actually went too far in this regard, and
imposed an unnecessary constraint which had negative effects for
optimization. Since most people would prefer faster programs
with smaller data, C++ did what it could to loosen this
constraint.

Can you give an example of reasonable code where this makes a
difference?

That, and the fact that a class cannot have a size of 0, are
about the only restraints. C (and C++ for PODs) also have a
constraint that the first data element must be at the start
of the object; the compiler may not introduce padding before
the first element.

So you are saying that a non-POD does not have to have the
first data element at the start of the object.

Obviously. Where do you think that most compilers put the vptr?

Example number 3 of heavyweightness. (NOW we're getting
somewhere!). So "losing POD-ness" IS still "bad" and my
assumed implication of that and use of "POD-ness" seems to
have been correct.

OK. I'll give you example number 4: the size of a long
typically depends on compiler options (and the default varies),
so putting a long in a class makes it heavyweight. And is "bad"
according to your definitions.

I think you're being emminately silly.

If I'm not mistaken, the next version of the standard
extends this constraint to "standard-layout classes"; i.e.
to classes that have no virtual functions and no virtual
bases, no changes in access control, and a few other minor
restrictions (but which may have non-trivial constructors).
This new rule, however, does nothing but describe current
practice.

So in the future I will be able to have overloaded
constructors (I'm not sure what exactly a "trivial"
constructor is, but I assumed that an overloaded one is not
trivial) and still have lightweight classes, good. That threat
of a compiler not putting data at the front of non-PODs is a
real killer.

I have no idea. I've not bothered really studying this in the
standard, because I don't see any real use for it. (I suspect,
in fact, that it is being introduced more as a means of wording
other requirements more clearly than for any direct advantages
that it may have.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34