Re: How should I handle the multibyte char set string in C++?
On Apr 29, 4:40 pm, Dancefire <Dancef...@gmail.com> wrote:
I'm writing a program using wstring(wchar_t) as internal string.
The problem is raised when I convert the multibyte char set string
with different encoding to wstring(which is Unicode, UCS-2LE(BMP) in
Win32, and UCS4 in Linux?).
I have 2 ways to do the job:
1) use std::locale, set std::locale::global() and use mbstowcs() and
wcstombs() do the conversion.
Why not std::codecvt? A facet which you can obtain from a
locale.
2) use platform dependent functions to do the job, such as libiconv in
Linux, or MultiByteToWideChar() and WideCharToMultiByte() in Win32.
At first glance, it might be definitely to choose the solution 1) to
do the job. Since it's really C++ favor, and in details, the codecvt
facet is actually wrap the function by calling libiconv in Linux, and
MultiByteToWideChar() or WideCharToMultiByte() in Win32 (by different
STL implementation) to do the real job.(if my understanding is
correct).
However, I have 2 problems.
First, I have to set the global locale before I do the conversion.
Why? You can get a facet from any locale. That's the one
advantage C++ locales have over the C stuff.
[...]
Second problem, looks like the system dependent conversion functions
support much more encoding than std::locale() by each STL
implementation.
That's a problem with the C++ library implementation. A quality
implementation will support all of the code sets that are
installed on the system.
For example, libiconv support UCS-2LE encoding, but g++'s
locale() doesn't support it. MultiByteToWideChar() support
UTF8 conversion, but MSVC(8.0)'s STL std::locale() doesn't
support ".65001" for code page 65001 which is UTF8.
Finding what locales are available and work can be a bit of a
game:-). And how they are named, if you're not under Unix.
The locale string is not same on different platform might be the third
problem, but I can easily ignore it by #ifdef #endif.
So, back to beginning question, how should I handle the MBCS string in
C++?
The official answer is std::codecvt. In practice, I roll my
own:-).
--
James Kanze (Gabi Software) email: james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34