Re: wcout, wprintf() only print English

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sat, 23 Feb 2008 15:21:02 -0800 (PST)

Message-ID:

<a7141cd2-22c4-45cc-8c68-6070727c1d24@c33g2000hsd.googlegroups.com>

On Feb 23, 5:07 pm, Ioannis Vranos <ivra...@nospam.no.spamfreemail.gr>
wrote:

Jeff Schwab wrote:

[...]

However, my system still shows question marks for this. For
whatever it's worth, here's the (probably incorrect) way
that appears to work on my system:

#include <iostream>
#include <locale>

int main() {
std::cout.imbue(std::locale(""));
std::cout << "=CE=94=CE=BF=CE=BA=CE=B9=CE=BC=CE=B1=CF=83=CF=84=CE=B9=

=CE=BA=CF=8C =CE=BC=CE=AE=CE=BD=CF=85=CE=BC=CE=B1\n";

}

"Strangely" these also happen to my Linux box with "gcc
version 4.1.2 20070626".

cout prints Greek without the L notation to the string
literal.

The same with wcout prints an empty line.

I don't think the problem is so much wcout, as the wide
character literal. The compiler is obliged to do interpret the
contents of the literal in some way, and I would guess that it's
not doing this in a way conform with the input you've given it.

What does the compiler documentation say about how it processes
characters outside of the basic character set? What happens if
you replace your characters with their UCN, e.g.:

std::wcout << L"\u0394\u03BF..." ;

?

The same with wcout and L notation prints question marks.

This made me think to use plain cout, and it also works:

#include <iostream>

int main()
{
std::cout << "=CE=94=CE=BF=CE=BA=CE=B9=CE=BC=CE=B1=CF=83=CF=84=CE=B9=

=CE=BA=CF=8C =CE=BC=CE=AE=CE=BD=CF=85=CE=BC=CE=B1\n";

}

also prints the Greek message.

Seeing this I am assuming char is implemented as unsigned char
and this is working because Greek is provided in the extended
ASCII character set (values 128-255) supported by my system (I
have set the regional settings under GNOME etc). However why
does this also work for you?

Most likely, the compiler is just generating code which copies
the characters bit patterns, without ever looking at their
numeric values. So the signedness of char is irrelevant
(here---in other places, it can cause problems).

The code

#include <iostream>
#include <limits>

int main()
{
using namespace std;
cout<< static_cast<int>( numeric_limits<char>::max() )<< endl;
}

produces in my system:

[john@localhost src]$ ./foobar-cpp
127

In other words, plain char is signed. (It usually is, for some
reason.)

[john@localhost src]$

so I am wrong, char is implemented as signed char, and no
extended ASCII takes place.

There's no such thing as "extended ASCII":-). Still, I
regularly used ISO 8859-15 in plain char's, on machines which
are signed. If I look at the numeric value of the char, it's
wrong, but the bits are right, and they get copied through
correctly.

I just have to be careful when I use functions which expect an
int in the range [0...UCHAR_MAX]. (Those in the <cctype>
header, for example.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=C3=A9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=C3=A9mard, 78210 St.-Cyr-l'=C3=89cole, France, +33 (0)1 30 23 00 3=
4