Re: 64 bit C++ and OS defined types

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

comp.lang.c++

Date:

Sun, 05 Apr 2009 03:35:23 +0200

Message-ID:

<gr920j$iq9$1@news.motzarella.org>

* Goran:

2. in practice, an underflow with unsigned on raw arrays and some
(past?) implementations of STL leads to an earlier crash than going in
with e.g. -1.

This is pretty unclear, but unsigned opens the door for more bugs, so this
argument about probability of detecting those bugs is pretty lame. :)

Why? It's a classic application of "fail fast" at work: going into an
array with -x __happens__. E.g. bad decrement somewhere gives you -1,
or, bad difference gives (typically small!) -x. Now, that typically
ends in reading/writing bad memory, which is with small negatives
detected quickly only if you're lucky. If, however, that decrement/
subtraction is done unsigned, you typically explode immediately,
because there's a very big chance that memory close to 0xFFFF... ain't
yours.

Uhm, I didn't comment on that because it wasn't necessary given that the
argument was based on detecting bugs caused by signed/unsigned problems.

But consider with signed index that is negative, corresponding to large value
unsigned,

a[i]

If (1) the C++ implementation is based on unchecked two's complement (which is
the usual), then the address computation yields the same as with unsigned index.
So, no advantage for unsigned.

If the C++ implementation isn't based on unchecked two's complement, then either
(2) you get the same as with unsigned index (no advantage for unsigned), or (3)
you get a trap on the /arithmetic/.

So in all three possible cases unsigned lacks any advantage over signed.

This, not from data -- for I haven't any experience that I can recall with
code that supplies negative index (or corresponding with unsigned) -- but from
pure logic, which is a stronger argument, I do question your statement about
unsigned "leads to an earlier crash". The logic seems to dictate that that
simply cannot be true, unless the compiler is perverse. So I'd need to see some
pretty strong evidence to accept that it isn't totally wishful thinking.

The problems with unsigned types are well known.

Your compiler, if it's any good, will warn you about comparisions
unsigned/signed. Those warnings are serious. Where you have such type mismatch
(which results from unsigned) you often have a bug.

True, but why are signed and unsigned mixed in the first place? I say,
because of the poor design! IOW, in a poor design, it's bad. So how
about clearing that up first?

Yes, that's one thing that signed sizes can help with (the main other thing
being cleaning up redundant and unnaturally structured code, like removing casts).

However, as remarked else-thread, since the standard library unfortunately uses
unsigned, "can help" isn't necessarily the same as "will help".

If applied mindlessly it may exacerbate the problem instead of fix it. But then,
so it is with all things. Needs to be done with understanding. :-)

Your compiler cannot, however, warn you about arithmetic problems.

True, but they exist for signed types, too. Only additional problem
with unsigned is that subtraction is more tricky (must know that a>b
before doing a-b).

Yes, that's major problem, because the 0 limit is well within the most often
occurring set of values.

As opposed to limits of signed, which are normally way outside that set.

Thus, the 0 limit of unsigned is one often encountered (problematic), while the
limits of signed are not so often encountered (much less problematic).

But then, I question the frequency at which e.g.
sizes are subtracted. And even then (get this!), it's fine. Result is
__signed__ and it all works. (Hey, look! Basic math at work: subtract
two natural numbers and you don't get a natural number!)

ITYM, "Result is __unsigned__". And yes that works as long as keeping within
unsigned. The problem is that most everything else is signed, so keeping within
unsigned is in practice a real problem, and that's where the nub is.

Well, it
works unless you actually work on an array of bytes, but that example
is contrived and irrelevant, I mighty agree with you there.

Ah. :-)

I also question the relevance of signed for subtraction of indices,
because going into an array with a-b where a<b is just as much of a
bug as with unsigned. So with signed, there has to be a check (if (a-
b>=0)), with unsigned, there has to be a check (if (a>b)). So I see no
gain with signed, only different forms.

It's not so much about that particular bug. I haven't ever encountered it,
unless I did in my student days. It's much more about loops and stuff.

But regarding that bug, if for the sake of argument it's assumed to be a real
problem, then see above: it seems signed has the advantage also there... ;-)

There's a host of bug vectors in [arithmetic], including the main example of loop
counting down (incorrectly expressed).

Hmmm... But I see only one vector: can't decrement before checking for
0.

Well, above you talked about using unsigned-only arithmetic and how that works
out nicely when keeping to unsigned. And yes it does work out well using only
unsigned arithmetic. But now you're talking about /checking/ for 0, which
implies that somehow, the result will be mixed with signed -- which is often
the case, it often will be -- which defeats the earlier argument.

The loop example (well known, well-known solutions also, except that I seem to
recall that Andrew Koenig had a very elegant one that baffled me at the time,
like how could I not have thought of that, and now I can't remember it!):

for( size_t i = v.size()-1; i >= 0; --i )

This is the natural expression of the loop, so any fix -- which is easy --
adds work, both in writing it and in grokking it later for maintainance.

Another arithmetic example (I'm sorry my example generator is sort of out of
commission, so this is not a main example, just one that I remember):

for( size_t i = 0; i < v.size()*step; i += step )

Uh huh, if 'step' is signed and negative then it's promoted to unsigned in the
arithmetic expression, and then for non-zero v.size() the loop iterates at least
once.

Again, solutions are well known.

But they have to applied (and just as importantly, it has to be recognized in
each and every case that a solution needs to be applied), which is more work,
both originally and for maintainance, and makes for less correct software.

And so on.

So the two dangers above can take many forms, but honestly, how
difficult is it for someone to grasp the concept? I say, not very.

Judging from experience and discussions here, it /is/ difficult for many to
grasp the concepts of unsigned modulo 2^n arithmetic.

But that's not the primary problem.

The primary problem is the ease of introducing pitfalls and the added work. But
could one perhaps rely on humans catching mistakes and doing everything right?
Well, think about how often you catch an error by /compiling/.

You claim that these potential bugs are important. I claim that they
are not, because I see very little subtraction of indices in code I
work with, and very little backwards-going loops. That may be
different for you, but I'll still wager that these are overall in low
percentiles.

I'm sorry but the notion that all mixing of signed and unsigned happen in
indexing and count-down loops is simply wrong. Above is one counter example.
Happily modern compilers warn about some other examples such as signed/unsigned
comparisions, but e.g. Pete Becker has argued earlier in this group that trying
to achieve warning-free compilation is futile in the context of developing
portable code, and should not be a concern, and so I gather many think that.

You also conveniently chose to overlook (or worse yet, call it hand-
waiving) the true nature of a count and an index (they are natural
numbers). I can't see how designing closer to reality can be
pointless.

The correspondence you point out, but misunderstand, is /worse/ than pointless
in C++ (although not in some other languages).

In C++ the correspondence is

endpoints of basic value range [correspond to] endpoints of restricted range

The value range on the left is one of modulo 2^n arithmetic. Its endpoints are
not barriers, they are not values that shouldn't be exceeded. On the contrary,
in your arguments above you make use of the fact that exeeding those values is
well defined in C++, a feature to be exploited, "don't care" arithmetic (of
course with the catch that this implies no mixing with signed values).

The value range on the right is, on the other hand, one whose endpoints
constitute barriers.

Exceeding those barriers is an error.

So the correspondence, such as it is, is one of comparing, to the right, the
/numerical value/ of a barrier (exceeding which is an error) to, on the left,
the /numerical value/ of a wrap-around point (exceeding which is a feature to be
exploited), and disregarding the nature of the points so compared.

One can't have both, both error and feature to be exploited. So it's not
identity, it's not "closer". It's just a coincidence of numerical values, and
when you confuse the kinds of ranges they stem from you introduce bugs.

And so I have to tell you what somebody already told you here: you
seem to adhere to "anything that makes your point weaker is "grossly
irrelevant". Anything that supports your point is, however, relevant."

I'm sorry but that's just innuendo.

Cheers & hth.,

- Alf

--
Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
No ads, and there is some C++ stuff! :-) Just going there is good. Linking
to it is even better! Thanks in advance!