Re: compilers, endianness and padding
On Wed, 22 May 2013 00:15:25 CST
Edward Diener <eldiener@tropicsoft.invalid> wrote:
The compiler knows no such thing. It only knows that 'void* p' is a
pointer.
Of course it does, as you well know. The information isn't
recorded in the pointer, but any allocated memory -- no matter how
allocated -- has size. That size is tracked by the
compiler/stack/heap. It's C++'s way of keeping everything from
using the same space.
The compiler/stack/heap are all different things.
That is part of the challenge. Pointers can be assigned values in a
variety of ways. If you make them "bigger" in the sense of adding an
extent attribute, you have to touch each member of the menagerie.
From your 'struct' above the compiler only knows that p is a pointer
to void
That's true when defined. Once assigned,
string s;
void *p = reinterpret_cast<void*>(&s);
it's a pointer to something. That information is discarded
today, but need not be. Indeed, it would be valuable to keep; only a
few days ago there was a discussion on this list about the danger of
casting to void* and casting the result to any type other than the
original. By retaining the cast-from information, the running system
could throw an exception if that were done.
Absent better information, void* points to a sequence of bytes.
That pointer can be to anywhere in memory and can point to
anything. It does not have to point to dynamic memory or be on the
stack. I tend to doubt that today the information ( length in bytes )
about that 'void * p' is kept anywhere while the program is running
by the run-time system.
I'm sure your right. As I said, for a file-scope variable:
static const char *name = "Galileo";
the compiler reserves space in the object code for the string
"Galileo", but the size of that reservation is discarded.
I know what you mean by "heap". I was only questioning the idea of
what the "heap" knows.
Ah. I've always thought it peculiar that
char *p = malloc(10);
free(p);
works but
char *p = new char[10];
delete p;
leaks.
I know it's been justified time and again, but I don't see how it can
be seen as anything other than a step backwards. I wonder how many
cycles have been saved versus hours wasted.
I am not in principal against a run-time system that can track what
you want it to track but I think you may be understimating the
speed/size costs as well as the effort involved.
You may be right. I've been misunderestimated myself on occasion.
If there were overhead I would want an end-user to be able to opt out
of it. Not everyone will agree that the ability to automatically
serialize data should be paid for in terms of either slower code or
bigger code.
Acknowledged. OTOH, it's easy to overestimate the costs, because they
are so often currently borne by the programmer.
Let's separate two areas of concern: static metadata and pointer
extents.
Incorporating static metadata -- basically, a name-type-size tuple for
every structure member -- in the object code will make the object code
bigger. That's undeniable. If I were writing a compiler, I'm sure I'd
hear from users complaining about that, and be under pressure to offer
an option to remove it. (There will even be proprietary-software
concerns. Not every closed-source vendor will want his data structures
clearly disclosed.) I think it would be interesting to measure, though,
especially if we restrict ourselves to being able to iterate over the
members of a struct/class.
I don't see how extending the language to provide static metadata
imposes any runtime cost. I don't believe it's terribly difficult
given that every compiler already provides the information to
debuggers. (ISTM debuggers would then be easier to write.)
Just metadata and only metadata would be a boon to anyone doing C++
I/O, especially to library writers. If you want to see more people
using C++, that's surely one way to get there.
In the general case, serialization requires inheritance metadata, and
pointer extents, too. I'm personally not all that exercised about
inheritance, but neither does the inheritance graph strike me as
particularly difficult to represent. It would support some interesting
use cases. For instance, it would be possible to explain a Koenig
lookup without parsing the code.
To provide pointer extents requires runtime support. There is some
complexity for that reason among others. But it's not at all clear the
cost is nearly so great as it seems at first blush.
I believe the compiled program should track the extent of every
pointer. When people object about efficiency, I'm actually puzzled,
because I find that whenever I'm dealing with a pointer, any pointer,
I'm always tracking and testing against the extent
A *
foo( A *a, size_t len ) {
assert(a);
for( A *p=a; p < a + len; ++p ) {
if( bar(p) )
return p;
}
return NULL;
}
Who hasn't written that 1000 times in one form or another? How does
one deal with pointers without tracking the bounds of what they point
to? Who iterates over a string today relying on the NUL terminator
without reference to the allocated extent of the buffer?
Once we accept that every pointer has an extent, and that to use that
pointer we track its extent, why not ask the compiler to do the work
for us?
It may at first seem an unbearable cost, because it may seem that
a pointer's extent must be updated whenever it's incremented, and may
seem that
*p
becomes analogous to vector::at() instead of vector::operator[]. But
neither of those suppositions is accurate.
In the first place, it's not necessary to update the pointer's extent
or to check every dereference. In my A *a example above, the system
can compute the extent of p at will
extentof(p) == extentof(a) - (p - a)
In the second place, that computation need not take place unless
demanded. If extentof is never invoked, the information to produce it
need not exist in the executable.
Most important, though: we already bear the cost. We're tracking
the length of allocated objects, passing lengths with pointers, testing
against boundaries. We have been since 1975. Would 2017 be too soon
to move that information into the language, where it would be more
convenient? Not to mention unerringly correct?
A *
foo( A a[] ) {
A *p = a;
for( ; p < a + extentof(a)/sizeof(a[0]); ++p ) {
if( bar(p) )
return p;
}
return NULL;
}
I just don't see the problem. Or, rather, I don't see the technical
problem. ;-)
--jkl
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]