Re: mixed-sign arithmetic and auto

From:

Walter Bright <walter@digitalmars-nospamm.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Sun, 13 Jan 2008 05:48:19 CST

Message-ID:

<H4WdnSkmU896FBTanZ2dnUVZ_uuqnZ2d@comcast.com>

James Dennett wrote:

DSPs are far from dead, and many don't have support for 8-bit
bytes in any reasonable fashion.

I don't know much of anything about DSPs. But I know that many
specialized CPU chips tend to have specialized languages that come with
them, and that's perfectly reasonable. I don't think anyone wants to
recompile Office for a DSP, anyway :-)

Fortunately in the real world it's
not hard for good programmers to have reliable and portable
programs.

I disagree. For example, I've never found a non-trivial C++ program that
would port successfully between 16 and 32 bits, even by expert
programmers (far better than just good ones), without doing some
adjustments and bug fixing.

Changing endianness often breaks C++ code, as do changes in struct
member padding. Varying int sizes, and char signedness, also break
programs. The reason is simple, it is really really hard to look at a
piece of code and verify it doesn't have these issues. It is impossible
to test for portability issues without actually porting the code.

I have a large piece of code that works for DMC++, VC++, and an older
g++. Upgrading to the latest g++ breaks it. It still compiles, it just
produces wrong answers. I don't know yet what went wrong, but portable
C++ ain't.

It allows for implementations which diagnose all overflows;

Are there any such implementations?

it allows for optimizations based on assumptions of non-overflow,

Apparently the new g++ does that, though my question on what those were
is so far unanswered. My experience with optimizations that change the
behavior is that customers call it an optimizer bug, even if the fault
lies with their reliance on UB.

and diagnostics when such optimizations are made in ways that
could alter behaviour of code; it encourages implementations to
provide diagnostics for unsafe use.

I don't see any way to issue warnings on unsafe overflow use based on
static analysis of code.

Wrapping semantics may be well-defined, but are *NOT* always safe.

I'm not arguing that they are safe. I'm saying that well-defined
semantics make code that, once tested, can be reliably ported.

Safety depends on what
the specification/requirements call for. Pretending that
modular arithmetic is always the right solution is simplistic.

I'm not arguing that a specific is always the right solution. I'm
arguing that undefined behavior is the wrong solution because it is, by
definition, not the "right" solution.

Allowing for diagnostics of overflow in many senses can serve
a broader community better than oversimplifying.

Does any C++ implementation diagnose integer overflow at runtime?

How much effort have you seen, time and again, going into dealing with
the implementation defined size of an int?

Very little; it's a trivial thing. Good programmers use a type
which is guaranteed to have the properties they need, so they
won't use unadorned int for anything more than a -32767 to +32767
range

There aren't very many "good" C++ programmers, then <g>.

unless they know that their target implementations support
a larger range, they'll just use an int32_t-like type or a long,
or long long, as needed. Certainly I've seen mistakes made,

I was in the trenches when the big shift from 16 bit C++ to 32 bit C++
took place, and ints doubled in size. I can tell you for a fact that
various schemes for portably doing this were debated ad nauseum, and
that almost none of them actually worked when it became time to do the
real port. If you want an example, look no further than windows.h.

The converse was also true, C++ code developed for 32 bit machines
rarely ported to 16 bits without major effort, often a rewrite was required.

Even in D, people cannot seem to shake their C/C++ heritage in worrying
about the size of an int, and typedef it "just in case". I know I am
much happier with "int" than "int32_t". The latter just stinks. Sorry.

It's at least possible you aren't seeing actual problems with int sizes
these days because practically every C++ compiler sets them at 32 bits,
even for 64 bit CPUs. So you never know if your use of typedefs is
correct or not.

but I've seen mistakes made in languages with fixed-sized types too.

So have I. But the question is how prevalent are such mistakes, versus
mistakes from the int sizes changing?

Everybody deals with it, 90%
of them get it wrong, and nobody solves it the same way as anybody else.
How is this not very expensive?

Your perspective/experience do not match mine. Many places deal
with it, most of the get it right, and most of them solve it in
very similar ways, moreso since C99 et al standardized typedefs
for various integral types. The expense is insignificant in all
competently run projects I've seen.

I'll bet that in most of those competently run projects, the code has
never been ported to a compiler with different int sizes, so how good a
job they did has never been tested. I saw how well (i.e. badly) it
worked in the last big shift from 16 to 32 bit code.

Would you like to try porting one of the ones that get it right to 16
bits? I've got a beer that says it fails <g>.

And that brings us back to the fundamental problem with UB and IDB - how
do you *know* you did it right? It isn't testable.

Me, I'd rather define the problem out of existence, and have my good,
competent engineers working on something more worthy of their talents.

As this thread has demonstrated, even C++ experts do not know this
corner of the language spec.

I noticed one expert who was surprised by it (which did
surprise me). But then even you make incorrect claims
about basic aspects of C and C++ on occasion -- it doesn't
always mean that things are too complicated, just that
people are (all) fallible.

Yes, although I've read every detail of the specs and have implemented
them, I sometimes mis-recall bits of it. I bet if I sat down and quizzed
you on arcane details of the spec, I'd find a way to trip you up, too.
The point of all this is that dismissing problems with the language by
saying that "good" or "competent" programmers won't trip over them is
not good enough. Humans, no matter how good they are, screw up now and
then. I view the job of the language designer is, at least in part, to
make the design resistant to human failure.

For example, airplane pilots use checklists for everything. Is it
because they aren't good pilots? Absolutely not. They are good pilots
because they *use* the checklist, even though it seems silly. Even the
best pilots would (and have) made monumental mistakes like forgetting to
put gas in the tanks. Even though their very lives are forfeit, they
still make stupid mistakes.

I read an article recently about attempts to introduce checklists into
hospital procedures. The doctors are resisting because they feel
checklists are insulting and demeaning to their exalted expertise. The
reality is that hospitals that use checklists reduce mistakes by
something like 30% (I forgot the exact figure).

The best programmers are not gods. They make stupid mistakes, too. I
make them, you make them, Bjarne makes them. The "checklist" is the
compiler. The more the language can be designed so that mistakes get
caught by the compiler or in test, rather than being UB, the more
reliable software we can make.

I can assure you from decades of experience that, while this
problem can exist, I've never suffered portability issues because
of it,

Do you use compilers that have different signs? In the Windows world,
all the compilers, over time, gravitated towards using the same
signedness for char (signed) not because of happenstance, but because it
made real code more portable. I'm not in the least surprised that g++ on
x86 Linux also has chars signed. It's pretty easy to never actually
encounter a compiler with a different char sign, and hence have
undiscovered bugs.

because compilers have warned in any marginal situation,

Warnings are a good sign that there's something wrong with the language
design. BTW, I just tried this:

int test(char c) { return c; }

with:

g++ -c foo.cpp -Wall

and it compiled without error or warning. (gcc-4.1)

and it's rare to use unadorned "char" for anything where
signedness matters.

That's because we try to avoid that like the plague.

(Exception: <ctype.h>-related matters.)

String literals are char based (the sign gets you in hot water when
you're doing utf-8). The standard library (especially the C one) is
replete with char*. If you avoid using char, you wind up using a lot of
casts. It's not practical to avoid char.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]