Re: locale name strings on windows xp

From:

himanshu.garg@gmail.com

Newsgroups:

comp.lang.c++

Date:

Wed, 2 Apr 2008 04:09:53 -0700 (PDT)

Message-ID:

<db4356bf-fdbe-4173-acac-bf70514696cc@s37g2000prg.googlegroups.com>

On Apr 2, 2:58 pm, James Kanze <james.ka...@gmail.com> wrote:

On Apr 2, 4:15 am,himanshu.g...@gmail.com wrote:

I have a stdc++program that uses char/string everywhere and
works well with single byte characters.

For what definition of "works well"?

This is more relevant than you might think. I'll bet it doesn't
handle all possible accented characters correctly. Or Japanese
or Chinese characters. You probably expect this, however, if
you move to Unicode. In other words, you're adding
functionality. And that functionality will almost certainly
require additional code.

Yes it doesn't handle the chars you mentioned. It works only when
chars are single byte.

The program depends on what a char is so to make it work for
utf-8 I assume I just have to do the following :-
replace char by wchar_t

UTF-8 is stored in a char, not a wchar_t. On many systems,
wchar_t can be used for UTF-16 or UTF-32. Note, however, that
UTF-16 is also a multiunit encoding, and if you're really
dealing in characters, you have to deal with multiple code
points for a single character in UTF-32 as well.

set thelocaleto unicode/utf-8 using
locale::global(loc("<localename>"));

This may or may not be necessary, depending on what you're
doing. It is almost certainly not sufficient.

I wrote the following on GNU/Linux and for a utf-8 Arabic file it
outputs nothing :-
#include<locale>
#include<iostream>
#include<string>
int main()
{
    std::locale::global(std::locale("en_US.UTF-8"));
    wchar_t c;
    std::wcin >> c;
    std::wcout << c;
}

     The following program on my system outputs :-
C
2
int main()
{
    std::localeloc;
    std::cout << loc.name() << std::endl;
    std::cout << sizeof(wchar_t) << std::endl;
}

The first result is required by the standard. On start up, the
globallocaleis set to "C".

Is there a way I can find out the name of available locales
for use with thelocaleconstructor?

Not portably. Under Unix and Unix look alikes, I've found that
looking at the contents of a directory called "/usr/lib/locale"
or "/usr/share/locale" will often help (but these directories
may also contain additional files), and Unix has a formal naming
convention as well. I have no idea what the situation is underWindows. (=

Language names, e.g. "french" or "german", seem to

work, but I don't know how you'd specify an encoding.)

The only portable way I know of finding out exactly whatlocale
work is by an exhaustive search. Fairly easy to write, but
don't expect the program to finish anytime soon (say, anytime in
the next couple of centuries).

Will my approach work?

Not without some additional work.

If I read a utf-8 file will wchar_t store the character code
for the corresponding characters?

It might, if you use the appropriatelocale. If it does,
however, you've probably still got some additional work before
you can say that your program "works well".

Thanks for your reply. My understanding of the problem has hopefully
improved.

Thank You,
Himanshu

--
James Kanze (GABI Software) email:james.ka...@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34