Re: Can I get some help please "warning: multi-character character constant" and more

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.lang.c++.moderated

Date:

6 May 2006 10:12:05 -0400

Message-ID:

<1146817373.411558.326940@j33g2000cwa.googlegroups.com>

Allan W wrote:

[Just a couple of nits. Your explination is actually quite
good.]

Here's a more detailed explanation:

There are several different types of "literals" in C++.
     3 -- is an "integer-literal" with the value 3. You
                   can use it in expressions such as
                      age = 3;
     3.0 -- is a "floating-literal" with the value 3. You
                   can use it in expressions such as
                      result = 3.0;
     "Three" -- is a "string-literal" with five characters plus
                   the terminating null character. You can use it
                   in expressions such as
                      std::cout << "Three" << std::endl;
                   (or, sinc you've use "using namespace std;" you
                   can just write)
                      cout << "Three" << endl;
     'X' -- is a "character-literal" which is the letter X.
                   In some contexts you can use it as if it was an
                   integer with the same value as the character
                   code for an X (this is 88 in ASCII, other values
                   on non-ASCII systems).

The type is also a key difference. The literals 3.0 and 3 have
the same "value" (for the usual, everyday meaning of value), but
have different types.

It's important to realize that different types use different
interpretations of the underlying bits to represent the value.
Arguably, 3, 3.0 and '3' represent the same value; their size
and bit patterns are, however, different. In the case of C++,
the issue is further clouded by the fact that C++ has no real
character type: '3' is still an integral type, but not the
same integral type as 3, and also not the same value -- as you
say, it's value is the value of the character code (which is
still a number, and not a character).

Of course, how different bit patterns are interpreted is a
question of convention. Sometimes, the convention is practially
imposed: the C++ standard requires integral values to be
represented in a base 2 notation. Othertimes, hardware offers
direct suppport -- most modern platforms have hardware floating
point support, for example. In the case of characters, the
issue is a bit more complex: the conventions are established by
the software in the windowing drivers or in the printer
hardware; on at least some systems, the conventions can vary
according to the user environment, and it isn't rare for the
system to tell the program that one convention is in effect, but
to use a different one for display in the windowing driver, and
yet a third in the printer. (For the original poster: don't
worry about this yet! You can get a lot of work done,
especially in an English speaking environment, without it ever
being a problem. On the other hand, in a multilingual,
networked environment, it can drive you nuts, because you have
no control over so many of the factors.)

On some computers, the data type that holds single characters is
actually able to hold more than one character at the same time,
but this is not portable. On those systems, you could write
'AB'
and your character-literal would contain both the letters A and B
(in that order). But this is still different than a string.

It's worse than that. Historically, C promoted everything in an
expression to an int. And character literals had type int,
which typically could hold more than one character. C++ broke
with C here, because you really do want character literals to
overload differently than int's: "cout << ' '" should output
" ", and not "32". But it only did a minimal break:
multi-character literals were still supported, with exactly the
same semantics as in C (which is to say: implementation defined
semantics). Thus, '3' has a type char, and an integral value of
51 (0x33), but '32' has a type int, and an integral value of
13106 (0x3332) on my machines -- more importantly, overload
resolution prefers char for '3', but int for '32', so that "cout
<< '3'" outputs "3", but "cout << '32'" outputs 13106. For the
orginal poster: this brings us back to the conventions
concerning the representation, above. The convention for <<
char is to treat the set of bits (the integral value) as a
character code, and output the corresponding character; the
convention for << int is to treat the set of bits as a signed
integer, and output the value of that integer. (This convention
is defined by the C++ standard in the case of a << operator
where the left hand operand is an ostream.)

Of course, if the execution character set includes multibyte
characters, the issue becomes even more clouded. Supposing
UTF-8 as the execution (and source) character set, something
like '?' is a multibyte character constant: type int, and << '?'
would output something like "50089". (I say would, because none
of my compilers support UTF-8 as a character set.) On the other
hand, if the source character set is UTF-8, and the execution
character set is ISO 8859-1, then '\xC3\xA9' is a single byte
character constant, and << '\xC3\xA9' should output "?".

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]