Re: iconv trouble
On May 29, 11:19 pm, Daniel Luis dos Santos <daniel.d...@gmail.com>
wrote:
On 2009-05-29 19:59:50 +0100, Daniel Luis dos Santos
<daniel.d...@gmail.com> said:
[...]
But now I am confused. Isn't UTF-8 locale independent ?
The encoding use in char (and possibly in whcar_t as well) is
determined by the locale. At least for functions which depend
on the locale---I would expect, however, that a function which
takes names of encodings as arguments (although "WCHAR_T" is not
the name of any encoding I'm familiar with) would use those
names, and not the current global locale, to determine the
encoding.
I was supposing that UTF-8 contained every possible character
and that a conversion existed between it and wchar_t.
UTF-8 is a Unicode Transformat Format. As such, it has
encodings for all Unicode characters. Certainly not every
possible character. (I could invent a new character tomorrow,
for example.) UTF-8 encodes Unicode in octets.
wchar_t is a type, which has nothing to do with encoding. What
it actually corresponds to, and which encoding the system
libraries use by default with it, is implementation defined.
Typical implementations make it a 16 or 32 bit type, using
UTF-16, UTF-32 or EUC. There is an exact translation between
UTF-8 and UTF-16 or UTF-32, since all are encoding formats for
Unicode. I don't know off hand about EUC.
What if in my program I want decode characters from different
locales than the one on my machine ? From what I've learned
from the glibc docs, the call to setlocale sets the locale
machine-wide, so that is not an option as it would mess up
other programs, right ?
If you're working in C, or interfacing with a C library,
setlocale can be called with a null pointer to determine the
current global locale, which you can then restore. (Of course,
in a multithreaded envirionment, you'll have to ensure thread
safety when doing this.) In the case of C++, the standard idiom
is to passe a locale to the function, with possibly a default
argument of the global locale.
How do you deal with this when a single program must handle
multiple locales ?
In C++, you can maintain several locales, and pass them around.
In C, you have to read the current locale, and restore it when
finished.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34