Re: mixed-sign arithmetic and auto

From:

Walter Bright <walter@digitalmars-nospamm.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Sat, 12 Jan 2008 06:12:44 CST

Message-ID:

<zIOdnW_qHabJjhXanZ2dnUVZ_uWlnZ2d@comcast.com>

James Dennett wrote:

Walter Bright wrote:

James Dennett wrote:

Walter Bright wrote:

C++ would better serve programmers by standardizing much of the
undefined behavior and catering to the needs of the 99.999% of C++
programmers out there, rather than some wacky, obsolete machine.

That's subjective: C and C++ have opted *not* to limit itself to
mainstream architecture, and are probably more widespread than any
other languages partly as a result of that.

On the other hand, Java went the route of eliminating undefined and
implementation defined behavior, and has had astonishing success
(largely at C++'s expense).

Look at any non-trivial C++ app - it's larded up with #ifdef's for
various local variations. Aiming to reduce the need for these will make
C++ more portable, reliable and useful.

There are certainly advantages in simplicity to restricting
a language to supporting only more "normal" architectures,
and if I were designing a language I'd do as you did with D,
and assume 2's complement, word sizes being powers of 2, and
so on.

The D programming language does nail down many UBs and IDBs:

1) integer sizes are fixed
2) bytes are 8 bit
3) source code set is unicode
4) floating point is IEEE
5) char's are unsigned

I intend to take this further and define the order of evaluation of
expressions, too. D won't eliminate all UB and IDB, because things like
defining endianness are not practical, but it will go as far as it can.

I'm old enough to have programmed on 36 bit PDP-10s, processors with 10
bit bytes, DOS near/far/ss programming, and EBCDIC. But those machines
are dead. Nobody is designing new nutburger machines.

The advantages you cite aren't simplicity - they are:

1) portability
2) robustness
3) predictability
4) reliability
5) correctness

The larger a system one is working on, the more important these become.
Unless you can mechanically detect reliance on UB or IDB, you by
definition cannot have a reliable or portable program.

That doesn't mean that C or C++ made a "wrong" choice,
it just means that their design goals aren't the same.

Whenever doing an update to the standard, it is worthwhile revisiting
old design goals and assumptions to see if they still make sense. I
contend that supporting such other integer arithmetic no longer makes
sense, and am fairly convinced that it never did.

I view a C++ compiler as being a tool that either is useful or it is
not. That is not quite the same as if it is standard conforming or not -
standards conformance is only worthwhile if it serves the need for
making a compiler useful.

If on Wacky CPU X, integer arithmetic is not 2's complement, and the
standard says it must be, does that mean one cannot implement C++ on
that platform? No, it means one can still implement a variant of C++ on
that platform. This variant will be *no less useful* than the current
situation of undefined integer behavior. For all the other 99.999% of
the programmers out there, C++ is *more useful* because the integer
arithmetic will be more portable and reliable.

Every useful C++ compiler in the DOS world had extensions to C++
specifically for that platform, C++ compilers without those extensions
were useless toys, and support for DOS was what gave C++ critical mass
to succeed. Furthermore, some C++ features are impractical on DOS -
exception handling and templates. A useful C++ compiler for DOS will
disable those standard features.

The interesting thing about this is that while one might think this
pulls the rug out from under such machines, in reality it does not.
There's nothing wrong with a compiler vendor for Wacky Obsolete CPU to
state that "This compiler is C++ Standard Conforming except for the
following behaviors ... 1) integer overflow is different in this
manner ...."

Exceptions to conformance are major issues for those who write
widely ported code (which is many of the organizations for
which I've worked).

The *exact same* issue exists if the standard says "UB" for integer
overflow. All the standard is doing here is dumping the problem off onto
the user, the problem does not go away. It does not aid the user by
allowing a program to "launch nuclear missiles" to be standard
conforming behavior upon integer overflow.

Working around non-compliance is a very expensive game.

Working around UB and IDB is just as expensive, and I'd argue it's more
expensive because:

1) reliance on UB or IDB is not mechanically detectable, making programs
*inherently* unreliable

2) one cannot rely on what the compiler from machine to machine, version
to version, or even compiler switch to compiler switch

How much effort have you seen, time and again, going into dealing with
the implementation defined size of an int? Everybody deals with it, 90%
of them get it wrong, and nobody solves it the same way as anybody else.
How is this not very expensive?

The problems facing the programmer for such a WOCPU won't be any
different than if the Standard allowed such unusual behavior

That's false. It would require special knowledge of that system,
hence every such system, whereas currently knowledge of just one
document, the C++ standard, suffices. That's the strength of
having a standard.

As this thread has demonstrated, even C++ experts do not know this
corner of the language spec. Worse, workarounds for this issue are
difficult and rarely discussed. And disastrously, reliance on such UB
behavior cannot be mechanically detected.

Conversely, defining the behavior means that one does not have to know
how other systems work. The less UB and IDB, the easier the porting
gets, reducing costs.

For another example, can you guarantee your C++ programs aren't
dependent on the signedness of 'char'? How is knowledge of the standard
going to help you with this? I can guarantee you from decades of
experience with this, that you can understand every detail of the spec
and very carefully not depend on the sign of 'char', yet until you
actually try out your code on a compiler with a different sign, you have
no idea if your code will work or not.

UB and IDB are not strengths of the standard. They are costly weaknesses.

and it would be better because at least (presumably) the documentation
would list the non-conforming behavior and the programmer can keep an
eye out for it.

For the other 99.999% of programmers, they have behavior they can rely
on. It's a win-win all around.

Your 99.999% is rather optimistic, giving only 10 per million who
need to care about such systems. Some languages are needed to cater
for *very* portable code. D isn't such a language, and is not
intended to be. C and C++ are. Swings and roundabouts.

My experience porting D code between platforms is it ports easier than
the equivalent C++ code. UB means a program operates in an unpredictably
different way on different platforms. I agree that this makes the
language *spec* more portable, but I disagree that it makes language
source code more portable.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]