Re: How Type-Safe is C++?

From:

Lance Diduck <lancediduck@nyc.rr.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Wed, 28 May 2008 14:57:37 CST

Message-ID:

<c44e81d1-fd10-4b2d-92f1-70c58035e74b@a70g2000hsh.googlegroups.com>

On May 27, 1:22 pm, Tjark Weber <tjark.we...@gmail.com> wrote:

Are there any attempts then to prove type safety of C++ (without
pointer casts)? The only work that I'm aware of is "An operational
semantics and type safety proof for multiple inheritance in C++" by
Wasserrab, Nipkow, Snelting, Tip (2006).

Tjark

There are a number of problems with this:
1. Since C++ is backward compatible with C (in the vast majority of
cases). Most parsers use recursive decent to deal with this. This
makes it very difficult to write any proofs about the text of a C++
program. Note that the parts that aren't C compatible (templates,
inheritance) are fairly amenable to proof.
2. large portions of the standard relies on "implementation defineds."
So even if a proof could be written, things like encoding of strings,
fundamental types and such would need to evaluated on a case by case
basis.
3. the word "type" itself is somewhat of an vague term. In C++ land,
it has several common uses:
a. "structural type" refers to actual layout of the bits themselves,
as in a POD type. Although in theory C uses named types for its type
system, many program designer will think in terms of structural types
only. This is often seen in messaging systems.
b. "strong type", where type means the collection of invariants
preserved.
c. "language type" are differing kinds of constructs used by C++,
(enums, class, pointer to members, unions, etc) that template
metaprogrammers love to mess around with
d. "user defined type" with 'language types' those things that
actually make it into the symbol table. Included in here are the types
generated by instances of templates.

Any provable C++ program would need is some restrictions on the
allowable constructs. With these restrictions, coupled with a formal
framework, then we can use the C++ compiler as a proof assistant.
For example, using Hoare logic:
The rules a programmer must follow are these:
1. All functions are total -i.e. will satisfy their post conditions
for all inputs
2. No function other than constructors evaluates preconditions
3. All state changes are either linearized or transacted

Now, with these rules it is practical to construct a program that will
either run correctly or not compile:
void doSomething(int channelA, int channelB){
assert((channelA+channelB)<25);
//more stuff
}
//typical program usage
int a=IOfromA;//spec says less than 10
int b=IOfromB;//spec says less than 15
doSomething(a,b);
//Intermittent BUG!!! How do we find it?

From unit tests, we know that doSomething works correctly as long as

the preconditions are met. Otherwise, we get UDB. So the problem is
likely that we screwed up the path that the inputs from channels A and
B takes.
The typical approach is to use the assertions to cause a core dump,
and try to trace the paths that the arguments took. This is a tedious
approach - however, by using the type system (and the Hoare logic
above) I can "trace" the paths at compile time. In effect, what I get
is a full coverage test - which is exactly what many mathematical
proofs are.

Since the "input from channels A and B" were both modeled by using an
int, this is bad news. The program is perfectly type safe as far as
the language is concerned - doSomething has a well defined signature,
and the properties of int are well known.
The problem is of course the program designer did not choose the types
properly. Lets make a small adjustment

struct ChannelA{
    int get()const{return val;}
    explicit ChannelA(int vala):val(vala){
        if(vala>14)throw 1;
    }
private:
    int val;
};
struct ChannelB{
    int get()const{return val;}
    explicit ChannelB(int vala):val(vala){
        if(vala>9)throw 1;
    }
private:
    int val;
};
void doSomething(ChannelA channelA, ChannelB channelB){
//more stuff
}

Now what I do is drop in this new version of doSomething, and let the
compilation break. I fix the immediate callers of doSomething, (by
replaing the ints with the UDT's) and keep this up until I get to an
IO boundary. By this time it will be apparent what the problem is.
I have built libraries using 1000's of UDTs this way. So in this way C+
+ is very typesafe, and amenable to proof. So it is *possible* to do
it, the thing with C++ is that you are not required. You would need
some sort of external verification (code reviews) to make sure the
rules you did set up were followed.

Lance

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]