Re: What is the output of this program?

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.lang.c++.moderated

Date:

11 Jul 2006 16:43:19 -0400

Message-ID:

<1152610711.696522.28820@35g2000cwc.googlegroups.com>

Alf P. Steinbach wrote:

* James Kanze:

Alf P. Steinbach wrote:

* dnguyenk@yahoo.com:

//...
string ToUpper(const string& s1)
{

Class 'string' is undefined. For the sake of
discussion, assuming it is 'std::string'. This could
have been avoided by not using a 'using'.

string s2;

Uninformative name. 'result' would have been informative.

for (int i = 0; i < s1.size(); ++i)

'int' doesn't necessarily have large enough range.

But realistically, on a 32 bit machine, that won't really be a
problem. (Today. As memory becomes bigger, of course, it could
become reasonable to have strings longer than 231 characters.)

No, realistically, int is only guaranteed to be 16-bit, and
the OP did not state any particular target system: AFAIK most
development is for embedded systems (whether it's that way for
C++, I don't know).

It depends on what you understand by "realistically". There are
certainly 16 bit embedded systems around; I doubt very much,
however, that they are hosted implementations, or that people
play around with std::string on them. More to the point, I
explicitly said 32 bit machines---many people don't need to be
portable to 16 bit machines today, and if that's the case,
worrying about the fact that the standard allows int to be a 16
bit value isn't relevant.

Note that I don't disagree with your overall point. i should
have type size_t, if only to ensure that you aren't mixing
signed and unsigned.

if (isalpha(s1[i]))

'isalpha' (assuming this is 'std::isalpha') may produce
an unexpected result if 'char' is signed.

It depends on what headers were included. Generally
speaking, std::isalpha() doesn't have this problem.

No, that's incorrect.

According to the standard, std::isalpha takes two arguments, and
works with char, signed or not.

On the other hand, it takes two arguments, so it is
obviously not the one he is using.

No, there are two overloads of std::isalpha.

There's a lot more than that.

"It" has one overload that takes one argument, available from
<cctype>, and one that takes two, available from <locale>.
The one with two arguments is not the one called above with
one argument.

But the error isn't that he is passing a signed char to a
function where it is illegal; the error is that he forgot an
argument. Written correctly, std::isalpha doesn't have this
problem. The old, C compatible ::isalpha does, as does the
compatibility hack std::isalpha in <cctype>. But there's no
reason to use this in modern C++, given that there is a "pure C"
alternative.

If we assume ::isalpha, from the C header <ctype.h>, then it's
not an "unexpected result", it's undefined behavior.

No, the statement may produce an unexpected result, as I wrote, and it
is (for actual negative argument values except EOF) undefined behavior.
You are correct that it isn't an unexpected result. However, nobody
has maintained that it is, only that it may produce one, which it may.

Alf, I know you know the difference between undefined behavior,
and unexpected results. In practice, at least on the machines I
have access to (Sun CC under Solaris, g++ under Linux), it
produces the expected result for all values except '?'. (And
this is guaranteed, as an extention; one characteristic of
undefined behavior is that the implementation can define it.)
What it produces on any given machine is irrelevant, however,
with regards to the language C++, and an error checking
implementation could very well do bounds checking and abort when
passed a negative value.

Or were you simply being a little ironic (or using litote)---a
core dump is probably not the "expected result". If so, I
misunderstood you, and I agree.

(There's also a potential thread safety issue, but in
practice, it's not a problem.)

No, the C++ standard does not address thread safety. Or
threads, at all. With any particular C++ implementation,
thread library and calling context there may be a thread
safety issue or not.

That's what I meant by "potential". Under Posix, there is
definitly a thread safety issue---but only if another thread
calls setlocale at the same time you call isalpha. Typically,
of course, setlocale will be called once on program start-up,
before threads are started, and never otherwise, so typically,
this isn't a problem. But it's something that one probably
should keep in mind.

(And of course, we know that the C++ language doesn't address
threading yet. It almost certainly will, and regardless of
whether the language addresses the problem or not, most C++
programmers do have to take it into account.)

This means Unpredictable Behavior.

According to the standard, undefined behavior.

Yes.

But I agree that the probability of anything other than
just a random result is pretty low.

No, I haven't stated that.

Argument should be casted to unsigned.

unsigned char. Casting it to unsigned isn't correct
either, as it can result in values like 0x7FFFFF62.

Yes. I think one who understands enough to do the cast will
cast to the correct unsigned type. Nevertheless, it should
perhaps have been stated.

Given that the original poster didn't know enough that it had to
be cast to begin with...

Actually, I'd say that this is one place where a lot of people
have problems. The rules concerning promotion and conversion,
especially between signed and unsigned types, don't seem to be
especially well understood. (There's also a certain ambiguity
with "unsigned": according to the standard, it is a synonym for
"unsigned int", even if in common usage, we often use it as a
generic term for any unsigned integral type.)

[snip]

return 0;

Unnecessary; this is the default.

Still, main is a special case; it's much clearer to be
explicit.

Not as I see it, no, because what that statement is explicit
about is to indicate something that isn't true. The program
does not ensure a 'main' return value that reflects success or
failure, but the statement implies that it does, that there's
a possibility of some other exit code. There's nothing clear
about a statement that has no effect.

OK. At least if I understand you correctly: if you don't care
about the return value, or the only possible return value is
success, just fall off the end; if the return value might have
meaning (as it normally will in any program which doesn't run
indefinitly), then you explicitly say return 0 (or return
EXIT_SUCCESS) rather than just falling off the end.

Personally, I'd prefer "void main()" in such cases. But as the
standard forbids it... I guess I can accept this distinction.

}

No catching of possible exception.

With all the undefined behavior, he couldn't catch them reliably
anyway:-).

No, it's not a good idea to never indicate which way you're
turning (when driving a car) just because you're already in
violation of the traffic codes, e.g. by driving on the wrong
side of the road. A bad habit like never using the indicator,
or whatever it's called in English, will follow you when you
learn about driving on the right side.

Seriously, I'm curious as to what you mean by this. For a
lot of simple programs, the default behavior of
terminating in case of an exception is quite acceptable.
And the only way to catch all exceptions is a
"catch(...)"; what do you do in the catch block?

For me, that depends on the program. For a novice, log to
std::cerr or std::clog, and exit with code EXIT_FAILURE.

The problem I have with it is that if I trap the exception, the
stack will be unwound, and I am guaranteed to have no clue as to
where it came from. If I don't trap it, there's a good chance
that the implementation won't unwind the stack, and I have a
core dump, which I can look at with a debugger, and see where
the exception came from. For a novice, this seems preferable.

(When deploying code, there are arguments both ways---the error
message you get when a program core dumps isn't necessarily
meaningful to a user. Although when deploying code in an
environment where the users aren't necessarily technically
aware, I will usually wrap it in a shell script which detects
the core dump, saves the core in a predetermined location, and
displays or logs some sort of user understandable error.)

A mistake made me stare at the screen for a while ...

Impossible to guess /which/ mistake that was; many to
choose from here.

Yes and no. Since he's also shown the set of test data,
we know that it doesn't actually contain values where the
signedness of char can cause problems. (The standard
guarantees that all characters in the basic execution set
have positive encodings when stored in a char.)

No, given all the errors we don't know that the test data
exhaustively tests all cases for the intended usage.

The code he posted was complete. We could clearly see that the
test data couldn't detect that he had forgotten an argument to
isalpha/tolower, and thus had undefined behavior. (For that
matter, test data can never show the absence of undefined
behavior.)

[snip]

2. Don't use raw indexing [] until you gain far more experience, use
'at' instead
(this catches bugs like forgetting to set the size of the result).

Use a good implementation of the STL which will core dump
with an error message in case of error.

Also, prefer iterators to []. It's more idiomatic, and in
this case, when creating a string (or any other
collection), the use of push_back() is also so idiomatic
that it should be second nature.

push_back can be a tad inefficient on strings, because strings are
usually short, and for short sequences the reallocations, assuming
scaling of buffer size, are more frequent and thus more costly.

True, and when dealing with longer strings, I almost always use
std::vector<char>, rather than std::strings. But for a
novice... The standard idiom is push_back. I'd say use it
unless there was some very concrete reason not to.

3. Do catch exceptions in 'main', report them on 'std::cerr' and in
that case return EXIT_FAILURE from 'main'
(this helps report e.g. an exception from 'at').

At least on the systems I work on, NOT catching an
unexpected exception is preferable. It also results in a
"failure" (seen from the OS side), but also gives a core
dump which allows for post mortem analysis; catching the
exception prevents this.

Depends much on the system and the programmer. Of course Just
In Time symbolic debugging is preferable to a 1960's core
dump... ;-)

I'm not sure. It supposes that the programmer is the user,
which isn't always the case. The nice thing about core dump's
is that you can recover them by email, and analyse them off
line.

However, for the novice the exception text will in most cases
be enough to get a good grip on the problem, much preferable
to fighting it out with a core dump,

I'm not sure I understand what you mean by "fighting it out with
a core dump". A novice is going to see core dumps often enough
anyway; he really should learn how to deal with them. And just
where is the difference between having the debugger pop up
immediately, and having to invoke it with the name of the core
file. (Once you're in the debugger, there is no difference.
You're doing a post mortem on a program which is no longer
running.)

and for the experienced and professional logs are Really Nice
To Have, and core dumps can be easily arranged when
reproducing the error.

For deployed code, a nice log with the information concerning
what you were doing before the core dump occured is often more
useful than the core dump itself---nine times out of ten, I can
find the error uniquely from the logs, without even looking at
the core dump. But I think we agree that it will be a while
before the original programmer is generating that kind of log.
(On the other hand, had he compiled with g++ and the correct
debugging options, he already be dealing with core dumps:-).)

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]