Re: Need help with printing Unicode! (C++ on CentOS)

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sat, 29 Aug 2009 04:42:19 -0700 (PDT)

Message-ID:

<7f578a7a-5a38-452b-8a67-426103052994@g6g2000vbr.googlegroups.com>

On Aug 28, 6:51 pm, Zerex71 <mfeher1...@gmail.com> wrote:

I'm sure this has been addressed before but I've hunted all
over the web and no one seems to provide a comprehensive
answer. I just want to do one thing: Under CentOS, in a
simple C++ program, I'd like to be able to print Unicode
characters to a console output.

I've never heard of CentOS, so I can't address any system
specific problems here (and they would be off topic).

For example, I'd like to print the musical flat, natural, and
sharp signs.

Here's what I've done so far:
1. Using Eclipse, created a small C++ console project.
2. Declare three chars, each of type wchar_t, and assigned them their
Unicode values (0x266d, 0x266e, 0x266f).
3. Attempted to print them out using wprintf().
4. Set my output console to a font which can represent the characters
(glyphs?) - Lucida Console

What locale are you using? And what encoding does the font use?
You need to ensure that the encoding in the locale is the same
as the one used by the renderer for the font.

A few observations:
1. I can go to a Unicode code page website and copy the
characters displayed and paste them into my source file which
is in the same font (that was my first trick which ultimately
blew me out of the water because Eclipse was bitching about
not being to save the files due to encoding...tried changing
it...then it promptly deleted all my lines and left me with a
bunch of NUL).

First, a source file isn't in a "font". A source file is a
sequence of text characters, in a certain encoding. A font
defines how specific characters will be rendered.

Secondly, in order to be displayable everywhere, I think that
the Unicode code pages use images, and not characters, for the
characters in the code pages. This allows displaying characters
which aren't in any font installed on the machine. There's no
way copy/pasting an image to your source file can possibly work.

2. Mixing cout and wprintf results in the wprintf statements being
totally ignored.

You've raised an interesting point. According to the C standard
(relevant to wprintf), you can't mix wide and narrow output on
the same stream (in this case, stdout). C++ has a similar
restriction---if you've output to cout, use of wcout becomes
illegal, and vice versa. And since stdout and cout/wcout are
supposed to use the same stream, and are synchronized with one
another (by default), I'm pretty sure that the intent is not to
allow this either. In general, all of your IO to a given source
or sink should be of the same type; if you want to output
wchar_t somewhere, all output should be as wchar_t.

3. Using only wprintf results in "Sign: ?" displayed in the
console output, even though it can display the glyphs
correctly when I pasted them (1.)

Probably a question of locale. In the "C" locale, most
implementations only allow characters in the range 0...127 when
converting wchar_t to char.

For wprintf, you'll have to set the global locale. For
std::wcout, you'll have to imbue the desired locale (since the
object was constructed using the global locale before you could
modify the global locale).

4. Calling setlocale() as directed by an example has no effect
on my program.

What did you use as an argument to setlocale()? (But this is
very OS dependent. I know how it works under Unix, but not for
other systems.)

5. Using fwide() to determine if my setup is legit works
because I don't hit the exit condition that I wrote for that
test.

So, I don't know what else to try to get this to work.
There's a lot of stuff about Unicode on Windows out there but
I'm not doing Windows, and figured the Linux community might
have an answer.

Linux is pretty simple. Just use a UTF-8 locale and a UTF-8
encoded font, and everything works pretty well. For that
matter, under Unix, if all you're concerned with is a few
special characters, I'd just manually encode them as strings in
UTF-8, and output them as char. Most (in not all) of the
locales simply pass all char straight through, without worrying
whether they're legal or not. So instead of a wchar_t with
0x266D, you'd use:
char const flat[] = "\xE2\x99\xAD" ;
and output that directly. (At least, that's what I think should
happen. I don't get any output for the above, but it works with
other Unicode characters, so I suspect that the problem is
simply that my fonts don't contain the characters you give. All
of the Wingbats (codes 2600 to 26FF) display as a simple blank
on my Linux machine.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34