Re: The C++ Object Model: Good? Bad? Ugly?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Wed, 12 Nov 2008 03:18:36 -0800 (PST)

Message-ID:

<c918849c-4f4f-47b6-8d2c-dc63b0ed0749@n1g2000prb.googlegroups.com>

On Nov 12, 12:01 am, tonytech08 <tonytec...@gmail.com> wrote:

On Nov 9, 3:45 am, James Kanze <james.ka...@gmail.com> wrote:

[...]

Well, that's C++. And C. And Fortran, and just about every
other language I'm aware of. The only language I know which
specifies the exact format of any types is Java, and it only
does so for the built-in types.

So what's your point. Data layout is implementation defined,
period. That was the case in C, and C++ didn't introduce any
additional restrictions.

My point is, apparently, that when a language arrives with "a bit
more" definition at that level, I will use it!

Why? What does it buy you (except slower code and increased
memory use)?

But that's a bit extreme, for one can "get away with" much
stuff beyond "it's implementation defined so it can't be
done". If "platform" means pretty much that one is tied to a
single compiler in addition to the hardware and OS, so be it.
So much for such a lame definition of "portability".

C++ is designed so that you can do non-portable things; they're
sometimes necessary at the lowest levels (e.g. like writing OS
kernel code). It makes it quite clear what these are, though,
and if you don't use them, your code should be reasonably
portable.

[...]

It's not just vptrs. It's loss of layout guarantee when you
introduce such things as non-trivial constructors. (I'm
repeating myself again).

You're repeating false assertions again, yes. You have no
layout guarantees, period. Even in a POD, this goes back to C.
Given that there are no layout guarantees that could be portable
given that wouldn't lead to excessive loss of performance.

[...]

That's probably what C/C++ "portability" means: implementation
of the language. I'd opt for more designed-in portability at
the programming level in exchange for something less at the
implementation level. Even if that means "C++ for handhelds"
or something similar.

But you've yet to show how defining things any further would
improve portability.

[...]

And a class with private data members is fundamentally
different from one with public data members. And a class
with user defined constructors is fundamentally different
from one without.

That's a problem, IMO.

Why? If you want classes to be different, the language provides
the means.

There are a number of different classifications possible

The only ones I'm considering in this thread's topic
though is the lightweight/heavyweight ones.

Without defining it or showing its relevance to anything at
all.

How is it not relevant to want to know what data looks like in
memory?

How is it relevant? What can you do with this information?

C++ claims to be low level/close to the hardware and a
programmer can't even know what a struct looks like in memory?

A programmer does know for a specific implementation, if he
needs to. But there's no way that it could be the same for all
implementations, so it has to be implementation defined.

[...]

Not necessarily impossible, but it would make the cost of
resolving a virtual function call significantly higher. And
what does it buy you? You say it would be "nice", but you
don't explain why; I don't see any real advantage.

No advantage to knowing that a struct size will be the same
across compilers?

It's not physically possible for the struct size to be the same
across compilers, given that the size of a byte varies from one
platform to the next. Even on the same platform, I fail to see
any real advantage.

I am just trying to understand where the line of
demarcation is between lightweight and heavyweight classes
is and how that can potentially change in the future and
hence break code.

There is no line of demarcation because there isn't really
such a distinction. It's whatever you want it to mean,
which puts it where ever you want.

It means that one must avoid pretty much all of the OO
features of the language if one wants to have a clue about
what their class objects look like in memory.

If you want something portable, guaranteed by the standard, you
have to avoid even int and double, since what they look like in
memory isn't defined by the standard, and varies in practice
between platforms. If you're willing to accept implementation
defined, then you can have as much of a clue as the implementor
is willing to give you---I know exactly how my class objects are
layed out with G++ on a Sparc 32 bits, for example. Even in
cases where virtual inheritance is involved. I've never found
any practical use for this information, however (but I might if
I were writing a debugger, or something similar).

[...]

C may have other things that you are refering to, what you are
implying, I don't know. I was concerned about the C++ object
model features, of which most of them cause any semblence of
data layout knowledge in memory to get thrown out the window.

C++ doesn't allow anything that isn't allowed in C. Nothing
gets thrown out the window. (Regretfully, because that means we
pay a price for something that is of no real use.)

Why do you care about the layout of the data anyway?
There's nothing you can really do with it.

Why do you keep saying that? Of course one can do things with
data in memory.

Such as? If there's something you can do with it, tell me about
it.

[...]

Yes there is if you don't want the size of your struct to
be it's size plus the size of a vptr. If maintaining that
size is what you want, then you can't have polymophism.
Hence, restriction.

What on earth are you talking about. C++ doesn't guarantee
the size of anything.

Not if you take the "across the whole universe" approach. I
can make enough simplifying assumptions, though, that will
allow me to write out, byte-wise, a struct and then read it
back in with confidence that I'll get back the same thing I
wrote out.

From withing the same executable, maybe. But that's about it.
If you write data in an unknown format, you can't guarantee that
it will be readable when you upgrade the machine. Or the
compiler. (At least one compiler changed the format of a long
between versions, and the size of a long generally depends on
compiler options, so may change anytime you recompile the code.
And unless I'm badly mistaken---I've no actual experience with
the platform---Apple changed the format of everything but the
character types at one time.)

That doesn't apply to those few projects where the goal is to
be portable to everthing from the abacus to the Cray 3001.

It doesn't apply to those projects which have to reread the data
after a hardware upgrade. Or a compiler upgrade. Or even if
the code is recompiled under different conditions.

(Nor does any other language.) If you need added behavior
which requires additional memory, then you need added
behavior which requires additional memory. That's not C++
talking; that's just physical reality.

That's only one paradigmical view of programming. It's also a
very tedious one.

Reality can be tedious.

[...]

I'm not sure what you mean by "struct-like" or "some bizarre
representation in memory".

Containing only the data members _I_ specified rather than
additional compiler-introduced stuff.

That's not the case in C. Luckily, because if it were, my code
would run almost an order of magnitude slower.

The representation is whatever the implementation decides it
to be. Both in C and in C++. I've had the representation
of a long change when upgrading a C compiler.

But "long" seems obsolete to me for that very reason. Use
width- specified ints and test them at compile time.

Width-specified isn't sufficient. Width isn't the only aspect
of representation.

Or derivation from "interfaces" (?).

How is a C program going to deal with derivation? For
that matter, an interface supposes virtual functions and
dynamic typing; it's conceptually impossible to create a
dynamically typed object without executing some code.

Code generation/execution is not what I'm worried about.

There's not much point in defining something that can't be
implemented.

Who suggested that?

You said that you wanted to only get what you'd written in the
struct. That would make an implementation with reasonable
performance impossible on some architectures. Like a Sparc, or
an IBM mainframe. Or many, many others.

[...]
Or you don't. Even without derivation, you've got "compiler
baggage" attached to the data portion. Both in C and in
C++.

C++'s baggage is a "deal breaker" though because one doesn't
know what that baggage is.

Nor in C, nor in any other language.

It could be reordering of the data members or crap inserted in
front, behind or in between the data members.

Except for reordering the members (which is very limited
anyway), C pretty much offers the same liberties. Most
languages offer the liberty to completely reorder members,
because of the space and performance improvements which can
result.

[...]

Can you give an example of reasonable code where this makes
a difference?

Yes, but I'd rather not. It should be clear that I don't want
to view all classes like interfaces (pure ABCs). If data is
public, I want to manipulate it directly sometimes as bits and
bytes.

Why? That's what I don't see. If I need to manipulate bits and
bytes (say to implement a transmission protocol), then I
manipulate bits and bytes. And not struct's; there's no
reasonable way struct's can be used at this level, without
breaking them for all other uses.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34