Re: wcout, wprintf() only print English

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sat, 23 Feb 2008 15:41:36 -0800 (PST)

Message-ID:

<ab43351a-b5cc-4094-949d-c59d9b75ab82@d5g2000hsc.googlegroups.com>

On Feb 23, 10:19 pm, Ioannis Vranos
<ivra...@nospam.no.spamfreemail.gr> wrote:

Alf P. Steinbach wrote:

* Jeff Schwab:

[...]

As has been remarked else-thread, by Rolf Magnus, one issue,
relevant for literal strings, is the compiler's translation
(or lack of translation) of the source code text's character
set to the execution character set.

There isn't such issue here, cout prints Greek literal
correctly and wcout not.

That's just because cout and narrow string literals are passing
your bytes through literally. Neither is doing anything with
them.

Also cin and string read and store Greek text correctly while
wcin and wstring look like they do not work for Greek text
input.

Using which locale? For input in what encoding?

Ans as has also been remarked else-thread, by Boris, one
issue, relevant for i/o, is that the wide character streams
convert to and from narrow characters. wcout converts to
narrow characters, and wcin converts from narrow characters.
They're not wide character streams, they're wide character
converters.

I am not sure I understand this.

Isn't L"some text" a wide character string literal?

According to the language. But the characters between the "..."
are still encoded in some narrow character encoding, which the
compiler has to translate into some wide character encoding.

Which narrow character encoding, and which wide character
encoding, is anybody's guess. The standard says that it's
"implementation defined", which means that the implementation
has to document its choices. Good luck finding such
documentation (for just about any compiler).

Don't wcout, wcin and wstring provide operator<< and
operator>> overloads for wide characters and wide character
strings?

Yes, but all I/O is actually byte oriented. So the do code
translations on the fly. According to the embedded locale.
(The last time I checked, in g++, you could embed any locale
installed on the system, and it would still act as if it were in
locale "C". But that was a very, very long time ago.)

Assuming no issue with translation from source code
character set to execution character set, if you use only
the narrow character streams you avoid most translation.

What do you mean by "narrow character" streams? char streams
right?

Yes.

He should have added that to be sure there's no code
translation, you have to embed the "C" locale.

There's still translation of newlines and possibly other
characters (e.g. Ctrl Z in Windows). Thus, using UTF-8
source code and UTF-8 execution environment character set,
and (mostly) non-translating narrow character streams,
everything should work swimmingly.

Another reason to avoid the wide character streams is that
they're not supported by the MingW Windows port of g++.

This is irrelevant. MINGW's problems are MINGW problems, I am
using GCC under Linux (Scientific Linux 5.1 which is
essentially Red Hat Enterprise Linux 5.1 source code
recompiled, like CentOS - give them a try).

Also I have MS Visual C++ 2008 Express installed.

Under Linux ! :-)

At least, not in the version I have.

And as I understand it UTF-8 is the usual in the *nix world.

For an interactive Windows program, you can set the
console's narrow character stream translation (to/from UCS2,
which is what a console window uses internally) temporarily
to UTF-8 via Windows' console API functions.

Disclaimer: I've never tried this for greek text + UTF-8
encoding, because I've not had to deal with that particular
issue.

Can you pinpoint where our code is wrong? Essentially the following:
#include <iostream>
#include <string>

int main()
{
        using namespace std;

        wcout<< "Give wide character input: ";
        wstring ws;
        wcin>> ws;
        wcout<< "You gave: "<< ws << endl;
}

It produces:

[john@localhost src]$ ./foobar-cpp
Give wide character input: =CE=94=CE=BF=CE=BA=CE=B9=CE=BC=CE=B1=CF=83=CF=

=84=CE=B9=CE=BA=CF=8C

You gave:
[john@localhost src]$

To start with, you didn't embed a locale which supports
characters outside of the basic character set.

while the code:

#include <iostream>
#include <string>

int main()
{
        using namespace std;
        cout<< "Give wide character input: ";

        string s;
        cin>> s;
        cout<< "You gave: "<< s << endl;
}

produces:

[john@localhost src]$ ./foobar-cpp
Give wide character input: =CE=94=CE=BF=CE=BA=CE=B9=CE=BC=CE=B1=CF=83=CF=

=84=CE=B9=CE=BA=CF=8C

You gave: =CE=94=CE=BF=CE=BA=CE=B9=CE=BC=CE=B1=CF=83=CF=84=CE=B9=CE=BA=CF=

=8C

[john@localhost src]$

Formally, the code has undefined behavior:-). Practically,
you're just shuffling bytes, so it "seems" to work.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=C3=A9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=C3=A9mard, 78210 St.-Cyr-l'=C3=89cole, France, +33 (0)1 30 23 00 3=
4