Re: How should I handle the multibyte char set string in C++?

From:

"P.J. Plauger" <pjp@dinkumware.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 1 May 2007 05:46:35 -0400

Message-ID:

<duidnZICh4jhkarbnZ2dnUVZ_qWvnZ2d@giganews.com>

"Dancefire" <Dancefire@gmail.com> wrote in message
news:1178003903.581160.249720@y80g2000hsf.googlegroups.com...

.....

[...]

However, I still cannot handle "UCS-2"/"UTF16" in Linux or
"UTF8"/"UTF16" in Windows by std::locale. Do you know how can I do
this?

In the Apache C++ Standard Library you can do it using
a codecvt_byname facet constructed with the name "UTF-8@UCS"
as an argument, although it's not mentioned on the documentation
page:http://incubator.apache.org/stdcxx/doc/stdlibref/codecvt-byname.html
Let me look into adding it.

Thank you, I know how to handle this in Apache C++ Standard Library
now. I will try that.
Do you know the how can I use g++'s STL do this? I mean, conversion
between wchar_t*, which contain UCS-4 string, and char*, which contain
UCS-2 or UTF16 string.

The problem is raised when I try to do a project can be portable
between Windows and Linux. I try to write the unicode string to a
file.

When I choose UTF8 to write, I get 2 problems,

1) VC80's STL doesn't support UTF8's locale, (althought Win32 api
support it, but use win32 api will make some of the code non-portable)
2) All of the string is CJK characters, so UTF8 will cost at least 3
bytes to store, enlarge 50% for storage which is unnecessary if I
store just use UCS-2. And I'm sure all the characters is in BMP of
ISO-10646. So I'd better just use 16bit to store it in the file.

However, If I choose UCS2LE, just like what stored in wchar_t in VC, I
got problem of reading the file at Linux, which g++'s STL looks like
doesn't support UCS-2LE locale, and wchar_t in Linux is UCS4 rather
than UCS2, so I cannot directly read the content. (same kind of story,
since libiconv support UCS-2LE, but if I use libiconv it will make the
part of the code non-portable and I have to let mycode depends on
libiconv).

So, What should I do in this case?

Everything you need is included in our Compleat Libraries, for both
VC++ and gcc. But they cost $.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com