Re: C++ Frequently Questioned Answers

From:

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail@erdani.org>

Newsgroups:

comp.lang.c++.moderated

Date:

Mon, 12 Nov 2007 14:45:49 CST

Message-ID:

<47387A7A.3090708@erdani.org>

Eugene Gershnik wrote:

On Nov 12, 12:09 am, "Andrei Alexandrescu (See Website For Email)"
<SeeWebsiteForEm...@erdani.org> wrote:

Eugene Gershnik wrote:

But that does not dilute your argument, which I will distill
to: "Since one can devise an API based entirely on void*, how come
memory is more structured than anything else?"

I don't think this is my argument (maybe I am missing something very
fundamental to you that you didn't spell out). The point I was trying
to make was that the structure you refer to, exists also for other
kind of things, not only memory. Let me try to illustrate this using
your example below.

Of course it exists. It's just not validated by the compiler. The only
things we can make the compiler validate are in-memory representations
of said structure using various encapsulation devices, all of which
fundamentally rest on typed memory.

The answer is, void* describes untyped memory which is not that much
different from a file that can only be type as a collection of bits. But
as soon as typing comes into play, memory does become "special". Consider:

struct Foo
{
float x;
Bar * p;

};

The integrity of the program using a Foo is dependent (and partially
proven during compilation) by the fact that when you have a Foo and you
fetch its first 4 bytes, they are going to look and feel like a float,
and when you fetch the next 4 bytes and dereference that as a pointer,
you'll get something that looks more or less like a Bar.

I can rewrite the above as follows. We have a HANDLE h of type Foo.
The fact that it is Foo (rather than Bar or Baz) is defined by the
fact that the operation GetNBytesAtOffset(h, 4, 0) returns a float
while the operation GetNBytesAtOffset(h, 8, 4) returns another HANDLE
of type Bar (similarly defined). Any other application of
GetNBytesAtOffset to h result in UB. The fundamental integrity of the
program depends on the programmer only applying allowed operations on
this handle.

You are describing an ad-hoc typing rule on top of untyped memory. This,
while valid, does not contradict my point.

A particular language like C++ can make this definition implicit by
the syntax above and so enable static checks or proofs that only
proper operations are applied to HANDLEs of type Foo. A different
language like assembler would require the user to manually ensure
proper usage.

You essentially describe typing rules.

Now look at the definition of Win32 mutex. We have a HANDLE h of type
Mutex. The fact that it is a Mutex (rather than a File or Event) is
defined by the fact that the operation ReleaseMutex... etc. The
fundamental integrity of the program depends on the programmer only
applying allowed operations on this handle. A language such as E
(hypothetical successor to D) allows me to have a special syntax that
enforces proper operation on Mutexes. Less powerfull languages like C+
+ require me to write a wrapper class for this.
No matter how hard I look I cannot see any non-trivial difference
between the two descriptions. If you see it what is it?

The difference is that you are describing ad-hoc rules defined in terms
of allowed sequences of function calls on top of untyped memory. If
those rules would be embodied by language-defined constraints, they
would define types.

There is work on using temporal logic to enforce rules such as "calling
release(m) is only allowed if acquire(m) has occurred in the past" which
might be related to the discussion, some of it at University of
Washington. I forgot the authors though, could anyone chime in?

that's the issue. The issue is that freeing dynamically-allocated memory
poses life-threatening risks to the program in ways that closing files
or sockets does not.

Well fist of all accessing a closed file *can* cause a life
threatening risk to the program (more of this below). Second I am not
sure this is not going the way of arguing about feelings. What is a
life threatening risk and how you rank the risks of invoking UB?

It's not about my feelings. It's about the program being in a soft error
state, where its type-ensured abstractions are unpredictably broken.

I have tried to argue on this newsgroup and in my talk of this year at
ACCU that garbage collection is fundamental to type safety, with limited
success, so I'm not getting my hopes high this time either :o).

I've seen your posts here on this topic before and I think I actually
understand your reasoning there. Problem is I don't think many people
would hold absolute 100% type safety as the goal worth pursuing at any
cost.

Of course not, and particularly not in a systems-level language (or at
least any language that wants to implement a memory allocator or a
garbage collector).

Every non-trivial program communicates with
outside world and, these days, often deals with multiple threads.
These facilities are not optional in practice. At the end the user
doesn't care whether the program misbehaves because of access to
invalid memory location or because a mutex wasn't locked.

That is correct. I wouldn't mix mutexes in the discussion as threads are
an unwieldy beast of their own.

Now wait a second. Leaving mutexes aside a *lock* (be it of a mutex,
monitor or something else) is the resource that I think refutes your
claim about special danger of memory-related UB.

The lock lives in memory. Any invariant about the lock being properly
used (RAII etc.) fundamentally rests on typed memory. Take the typing of
the memory away, the lock is as effective as the attentive programmer
using it.

But what I'm saying is that detecting
e.g. writes through a closed file/socket handle is possible,

No, not really. At least not without the same cost as detecting
dangling pointers.

while
dereference of a dangling pointer is not.

File/socket handles/descriptors get reused just like pointers are. An
effect of accessing a closed file handle is exactly the same as of
accessing a freed pointer. You might get a (detectable) garbage or a
(non-detectable) different file. If you read commands from a wrong
file the damage can be comparable or greater than from accessing wrong
memory. (Based on my feelings of ranks of damage)

Now, unlike modern allocators, modern OSes do try to delay handle
reuse but this doesn't mean the problem is not real or does not
happen.

Clearly writing through a handle that was closed and reopened through
another name is going to work badly. The point is that there was no
compiler to guarantee you the proper typing of the file or of the
communication channel modeled by a socket. However, the compiler can
type memory objects for you and therefore allows you to encapsulate file
or socket manipulation in ways that impart to a file any structure you
wish. My point is that all of that encapsulation relies on typed memory.
Take typed memory away, you have untyped bits in memory plus untyped
bits in the file - it's just turtles, all the way down.

Andrei

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]