Re: What is the output of this program?

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.lang.c++.moderated

Date:

11 Jul 2006 16:46:13 -0400

Message-ID:

<1152612540.728841.265800@75g2000cwc.googlegroups.com>

jbannon wrote:

James Kanze wrote:

jbannon wrote:

dnguyenk@yahoo.com wrote:

//...
string ToUpper(const string& s1)
{
    string s2;
    for (int i = 0; i < s1.size(); ++i)
        if (isalpha(s1[i])) s2[i] = toupper(s1[i]);
    return s2;
}

int main()
{
      string v1[5] = {"This", "is", "a", "small", "quizz"};
      string v2[5];
      for (int i = 0; i < 5; ++i) v2[i] = ToUpper(v1[i]);

      //..Print out v2. What is the output?

     return 0;
}
A mistake made me stare at the screen for a while ...
Could you give some hints to avoid similar mistakes systematically?
Thanks.

<SNIP>

There are at least two problems in the code:

Actually, there's only one major problem: the code is not
idiomatic C++.

Agreed, it's not idiomatic in the modern style which we
beginners are struggling with. That's why we were having the
problem.

Right. But most of the problems would disappear if it were made
idiomatic. The most obvious "correction" to his program, as
written, would be to initialize s2 with a copy of s1. But IMHO,
the correction is at a level too low.

It's important to understand the value of idiomatic use. You
don't have to like the idioms---there's a lot in STL that I
don't particularly care for. But they are what the experts are
doing. They work. And that's the key. Once you've acquired
the habit of using push_back to build strings, for example, the
type of error here simply cannot occur. You don't have to think
about it; it just works. Had I designed the STL, it would be
different, and the idioms would be different. But I didn't, the
STL is designed to be used in a certain way, using it that way
works, and trying to force it into a different model should only
be undertaken in extreme cases.

1. The statement if (isalpha(s1[i])) s2[i] =
toupper(s1[i]); does not append any non-alphabetic
characters to s2. This was the easy bit.

The statement in question doesn't append anything to s2,
ever. That should be an easy bit as well. For the above
bit of code to make sense, he'd have to have initialized s2
with s1. While the code could then be made to work, I don't
think it's really idiomatic C++. If you had a toupper
functional object, the idiomatic solution would be simply:

     std::transform( s1.begin(), s1.end(),
                     std::back_inserter( s2 ),
                     toupper() ) ;

Regretfully, the standard doesn't define such an object. (One
could easily argue that this is a design flaw in <locale>.)
IMHO, if you consider this functionality usable, then it is
frequent enough to receive a name, and to get its own functional
object. (That's a big if, however, since there is no way to
make it work with e.g. UTF-8.)

I know the last bit too but we're not at the stage of making
it work with UTF-8 yet.

Yes. The last bit was just a reminder that the standard library
solutions aren't complete (in this case, at least).

Were that the problem I would have suggested using a class
that actually supported a UTF-8 encoding (something like
Glibmm::ustring) but I assumed (perhaps wrongly) that we were
working purely within the realms of the "C" locale. However,
it doesn't solve the problem as stated in the OP if we use a
function object that the standard doesn't define.

Learning to write such an object could be an enlightening
experience. (Although in this specific case, there are lifetime
of object issues which make it non-trivial. The object can, in
fact, be implemented far more efficiently if you have access to
the internals of std::facet. Which is why it should be, IMHO,
a standard object.)

2. This one was a bit harder and it took me a while to see
it after a bit of testing (I am a beginner).

There's no harm in being a beginner, and not knowing
something, but there's one thing that is bothering me: what
books are you using to learn C++, if they don't teach the
basics of using a container. Normally, I would expect a
beginner to know how to use an iterator, and even
back_inserter, before he ever encountered [].

Not all of us have the benefit of being formally taught the
language and some of us, admittedly, still have C hangovers! I
have only 2 books where use of the STL is actually promoted
for beginners and these are Accelerated C++ and You Can
Program in C++. which I'm working my way through at the
moment. All the rest of my books present the language in the
usual bottom-up manner describing the lower-level features
first so that we would see [] before we encountered the
containers. Also, many of courses I've seen present the
language in the same way.

Which is what was bothering me. I certainly don't critize you,
or anyone else, for learning something in the way it was taught
to him. I am bothered, however, by the fact that there are so
many books and courses which still present C++ as it was
presented in 1990, or earlier.

String s2 is initialised to the empty string in the definition
string s2; by std::string's default contructor, thus the
expression s2[i] is invalid for any i > 0.

For any i. In the expression s[i], i, converted to a size_t (an
unsigned integral type) must be strictly less than s.size().

Aye, that's true, but s2[i] is still undefined and that was my
point. Perhaps I should have put it better.

The only thing I was really criticizing was the i > 0. It's
also undefined behavior if i <= 0. But perhaps I wasn't clear
enough. i is converted to an unsigned integral type---the
results are thus >= 0 (by definition). And the results must be
strictly less than s.size(), which is 0. An unsigned type can
never have a value which is strictly less than 0.

If you do as I did and replace operator[] with the
range-checked version s2.at(i) you will get an index out of
range exception being thrown. What was needed to fix it is a
line like

s2 += islower(s1[i]) ? toupper(s1[i]) : s1[i];

Given that all of the standard versions of toupper are defined
to return the argument if islower is false, the test is
unnecessary. Given that the one argument version has undefined
behavior when called with a char argument, however, you're still
in trouble.

I don't understand this at all. I know that the test is
unnecessary (as I said later) but you're going to have to be
patient with me here. Why does calling toupper with a char
argument result in undefined behaviour? I cannot find it in my
copy of the standard and I've never seen this written
anywhere.

The real problem is that you are calling the wrong toupper.
There is only one toupper which can be called with a single
argument, and it takes as parameter an int, which must be in the
range 0-UCHAR_MAX or equal to EOF. If you call the function
with a char, that char is converted to an int in the range
CHAR_MIN-CHAR_MAX. And since CHAR_MIN is often negative, you
have undefined behavior.

In practice, all of the implementations I know make this work
for all of the characters in ISO 8859-1, except '?' (which, when
converted to an int, has the same numeric value as EOF). But
it's still undefined behavior. (I generally ensure that my test
suites for any character oriented code contain a '?', precisely
to detect such errors. They're very frequent---and not just in
code written by beginners.)

which appends each converted character to s2. The trouble
with this of course is that there are, at least
potentially, O(n) allocations + 1 copy (the return
statement). I think we can get around this by calling
s2.reserve(s1.size()) so that we have only 1 allocation.

You could. Or you could simple suppose that the
implementation wasn't completely stupid, so you don't get an
allocation for every character, and that the number of
allocations you do get won't have an important effect on
your program. If it later turns out that you were wrong,
it's no hassle to add the reserve later. Until then, it's
one unnecessary line which has to be maintained.

Agreed but we can't make assumptions about what any
implementation will actually do when faced with this type of
code.

You not only can, you must. Realistically, you make thousands
of such decisions all the time; otherwise, you have to write in
assembler. (For example, the application I'm currently working
on will fail to meet its timing constraints if a + b were to
take five minutes, instead of the 20 nanoseconds it actually
takes. I count on the compiler doing things more or less
correctly, and I count on the implementation not being
completely idiotic.)

Sometimes, of course, it can happen that the assumption was
false. Profiling will reveal this fact, and in that case, we
start thinking about a work-around. Until then, however, it's
best to suppose that the implementation isn't perverse, and that
it does things more or less intelligently.

There will usually be some efficiency savings when using a
decent implementation of std::string but we don't know what
these will be since they're effectively hidden.

I'm still wondering why main() uses arrays rather than vectors.

Because main() was around long before std::vector was invented.

This reply I don't understand. I was merely questioning why
the main () program used an array of strings as opposed to a
vector.

Sorry. I though you were asking why main() (in the standard)
took a char*[], instead of an std::vector<std::string>, as
parameter.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]