Re: mixed-sign arithmetic and auto

From:

Walter Bright <walter@digitalmars-nospamm.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Sun, 13 Jan 2008 05:52:05 CST

Message-ID:

<5dSdnf-pHcH0MhTanZ2dnUVZ_i2dnZ2d@comcast.com>

Jerry Coffin wrote:

the Java standard (at least the last time I
looked at it) simply ignores the issue entirely or (more often) includes
some phrase like "IEEE floating point", that makes it sound like the
issue has been dealt with, but when you get down to it doesn't really
mean much at all.

IEEE 754 floating point arithmetic means a lot more than nothing, and
certainly far more than C++ floating point, but you're right that it
doesn't nail it down 100%.

Java certainly has a _much_ larger standard library, but this has little
to do with undefined or implementation defined behavior.

I agree with that.

1) integer sizes are fixed
2) bytes are 8 bit
3) source code set is unicode
4) floating point is IEEE
5) char's are unsigned

I intend to take this further and define the order of evaluation of
expressions, too. D won't eliminate all UB and IDB, because things like
defining endianness are not practical, but it will go as far as it can.

These requirements still limit practical portability, even among modern,
widely-used architectures. Quite a few DSP and even some PDAs, cell
phones, etc., don't support any 8-bit type, just to give one example.

I disagree, because nothing would permit a D *variant* from being
customized to unusual architectures. Such would not be less useful than
simply dumping the problem off to all users.

I suspect that this is an even better solution for those programming
such machines, because they won't be under the delusion that code that
has never been tested under such conditions would have been "portably"
written with some wrong notion of portability.

People are still designing and using machines that don't fit the
limitations above. In fact, I'd guess those limitations would cause
problems for the _majority_ of CPUs (though not for the CPUs in things
that are generally thought of as "computers").

A more relevant question is how many programmers are programming for
these oddballs, vs programming for mainstream computers?

(I get asked now and then to produce a custom compiler for some oddball
CPU, but I ask for all the development money up front because I know
there is no market for such a compiler.)

The larger a system one is working on, the more important these become.
Unless you can mechanically detect reliance on UB or IDB, you by
definition cannot have a reliable or portable program.

If you rephrased that as "reliable _and_ portable", you'd at least have
a point. Programs that depend on UB or IDB can certainly be reliable as
long as portability isn't required.

UB does not imply reliable or repeatable behavior, so any dependence on
UB is inherently unreliable _and_ unportable.

I can't say I agree. For most programmers most of the time, the only
interesting point is to ensure that the number of bits is sufficient
that overflow just doesn't happen. As long as that's the case, the
difference between one's complement and two's complement (for exmaple)
is entirely irrelevant.

Most of the time, sure. Even 99% of the time. But when you've got a
million lines of code, suddenly even the obscure cases become probable.
And when you don't have a thorough test suite (who does?) how can you be
*sure* you don't have an issue there?

I'm very interested in building languages which can offer a high degree
of reliability. While D isn't a language that gets one there 100%, it
gets a lot closer than C++ does. I am a little surprised at the
resistance to improving C++ along these lines. It's not like pinning
down UB is going to break existing code - by definition, it won't.

Nearly the only time anybody really cares when when writing an extended
precision integer library. You'd accomplish far more by defining (for
one example) the result of the remainder operator when dealing with
negative numbers.

Did that, too, just forgot to mention it.

You dismiss quite a few architectures that are currently in _wide_ use
as "nutburger", but then you act as if _anybody_ cared about MS-DOS
anymore?

The D programming language explicitly does not support 16 bit platforms.
That should leave no doubt about my position on that <g>. C++ was
designed to support it, and so is fair game for criticizing its
shortcomings in doing so.

I would vote for C++ to explicitly ditch 16 bit support. No problem there!

The *exact same* issue exists if the standard says "UB" for integer
overflow. All the standard is doing here is dumping the problem off onto
the user, the problem does not go away. It does not aid the user by
allowing a program to "launch nuclear missiles" to be standard
conforming behavior upon integer overflow.

The way you write, you'd think the average programmer spent a
substantial part of his time dealing with integer overflow. This just
isn't the case -- I can hardly remember the last time I wrote anything
where integer overflow was an issue at all.

I rarely worry about it either, but then again I've had several bugs due
to it. One in particular is in a storage allocator:

nbytes = dimension * element_size;

Crud, that overflows. I do lots of hash computations, too, which rely on
wraparound overflow.

Limiting portability to deal
with something that isn't a problem to start with is a poor trade off.

This may be the root of our different ideas: We have different
definitions of portability.

You (if you don't mind me putting words in your mouth) define it as the
likelihood of if a program compiles on X that it will also compile on Y.
Whether it works or not depends on how good the programmer is.

I define it as the likelihood of if a program compiles *and* works on X
that it will also compile *and* work on Y, regardless of how good the
programmer is.

1) reliance on UB or IDB is not mechanically detectable, making programs
*inherently* unreliable

You're overstating the situation. Certainly some reliance on some UB
and/or IDB is mechnically detectable.

Runtime integer overflow isn't.

2) one cannot rely on what the compiler from machine to machine, version
to version, or even compiler switch to compiler switch

This is true only in a purely theoretical sense, and you know it.

In this thread, it was pointed out that new for g++ are optimizations
that change the behavior of integer overflow.

My own code has broken from one g++ version to the next from reliance on UB.

Yes, a
few things change with compiler switches, but 1) only a few, and 2)
compiler switches don't just happen randomly or by accident.

A typical C++ compiler has a bewildering array of switches that change
its behavior.

Yes, when you're writing a library that's intended to be portable to
anything anywhere under any circumstances, these can be major issues.

They're major issues for people who need to develop reliable programs
such as, say, a flight control system, or banking software. Such
applications need more than reliance on "good" programmers and prayer.

If you're writing a game, who gives a darn if it fails now and then.

For most people writing end programs, the major issues are things like
keeping track of the make files to ensure that the correct compiler
switches get used on various machines -- and in most cases (at least
IME) these have little to do with UB or IDB and a great deal to do with
the fact that some compilers break perfectly well defined code under
certain circumstances (especially overeager optimization).

g++ 4.1 has 40 options that explicitly modify C++ language behavior.
That's 40 factorial interactions. I suspect there are more, like the
aforementioned integer optimizations.

How much effort have you seen, time and again, going into dealing with
the implementation defined size of an int? Everybody deals with it, 90%
of them get it wrong, and nobody solves it the same way as anybody else.
How is this not very expensive?

I've seen a lot of effort put into it repeatedly, but I'd say over 99%
of the time, it's been entirely unnecessary from beginning to end.

In D, the effort to deal with it is 0 because the problem is defined out
of existence.

If C
and C++ made it even _more_ difficult to deal with, so people would
learn to keep it from being an issue at all, everybody would really be
better off most of the time.

The way to make things more difficult is to make them compile time
errors. Then they cannot be avoided or overlooked. Ideally, if a program
compiles, then its output should be defined by the language.

In any case, C99 and C++ TR1 have both dealt with this for the rare
ocassion that it really is an issue (and, unfortunately, made it still
easier to write size-dependent code when it's completely unnecessary).

It's rarely an issue now because:

1) C++ compilers have dropped 16 bit support (and 16 bit ints).
2) 32 bit C++ compilers all use 32 bit ints.
3) 64 bit C++ compilers also use 32 bit ints.

In other words, C++ has de facto standardized around 32 bit ints.

C++ 0x will undoubtedly add this to the base C++ language as well. IMO,
this is almost certain to hurt portability, but at least those of us who
are competent can ignore it the majority of the time when it's
counterproductive; languages like Java and D don't even allow that.

In D, you can use a variable sized int if you want to:

typedef int myint;

and use myint everywhere instead of int. To change the size, change the
typedef. Nothing is taken away from you by fixing the size of int. It
just approaches it from the opposite direction:

C++: use int for variable sizes, typedef for fixed sizes
D: use int for fixed sizes, typedef for variable sizes
Java: doesn't have typedefs, oh well :-)

Conversely, defining the behavior means that one does not have to know
how other systems work. The less UB and IDB, the easier the porting
gets, reducing costs.

It gets easier, to a narrower range of targets. Outside that range of
targets, it becomes either drastically more difficult, or truly
impossible.

How does it become harder or impossible?

Contrary to your previous claims, targets you see fit to
ignore have not gone away, nor are they likely to do so anytime soon.

I think 16 bit DOS and 36 bit PDP-10's are dead and are not likely to
rise from their graves.

For another example, can you guarantee your C++ programs aren't
dependent on the signedness of 'char'? How is knowledge of the standard
going to help you with this? I can guarantee you from decades of
experience with this, that you can understand every detail of the spec
and very carefully not depend on the sign of 'char', yet until you
actually try out your code on a compiler with a different sign, you have
no idea if your code will work or not.

Oh come on. Typical compilers have had switches to control the
signedness of char for years.

Yes, and Digital Mars C++ does, too. I know of nobody who actually tests
their code using those switches. You and I can argue that "good"
programmers will, but we both know they won't.

Most switches of that sort are of limited utility anyway because they
screwup the abi to existing compiled libraries.

UB and IDB are not strengths of the standard. They are costly weaknesses.

They are not strengths or weaknesses -- they are simply boundaries.
Nothing more and nothing less. C and C++ are nearly unique only in the
fact that they make a serious attempt at specifying the boundaries of
what they do and don't define, whereas most other language specs simply
ignore the boundaries between what they do and don't define.

I agree that C and C++ do an unusually good job at specifying the language.

Just for a few examples, try to find a reasonable way to support an 8-
bit char in:

http://www.analog.com/UploadedFiles/Associated_Docs/352228244SHARC_getst
art_online.pdf

or:

http://focus.ti.com/lit/ug/spru731/spru731.pdf

Note that these are not ancient "nutburger" architectures -- these are
both current and in _wide_ use. Just for an obvious example, the last
time I was in Costco, they had a brand new HD-DVD player that (on the
outside of the box!) bragged about using a SHARC processor.

Here's the C++ compiler for the sharc:

http://www.analog.com/UploadedFiles/Associated_Docs/75285036450_SHARC_cc_man.pdf

The C++ compiler for sharc has many sharc specific extensions. It isn't
hard to imagine a D variant that would do the same. You'd have the same
difficulties porting C++ code to the sharc C++ compiler as you would
porting standard D to sharc specific D.

As for the 32 bit sharc characters, they would map on to the "dchar" 32
bit D character type. I imagine a D for sharc would issue a compile
error on encountering a "char". At least the programmer then has a clue
he needs to use "dchar" instead, and perhaps double check the code using
that variable.

As for sharc "shorts" being 32 bits, that doesn't help you if your code
needs to address or manipulate 16 bit data. Just taking away the 16 bit
type doesn't magically make the code work, even if it is in C++ and
still compiles.

Again, just being able to compile the code doesn't mean it's portable.

Also, I wish to point out the difference between "wide use" meaning
there are a lot of CPUs in circulation and "wide use" meaning a lot of
programmers are writing code for it. Only one programmer might be
writing the sharc code that is stamped out into millions of HD-DVD
units. HD-DVDs in Costco gives no clue about how many programmers write
sharc code, other than it is greater than 0. On the other hand, I've
shipped several hundred thousand C++ compilers for Windows.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]