.....
[...]
However, I still cannot handle "UCS-2"/"UTF16" in Linux or
"UTF8"/"UTF16" in Windows by std::locale. Do you know how can I do
this?
In the Apache C++ Standard Library you can do it using
a codecvt_byname facet constructed with the name "UTF-8@UCS"
as an argument, although it's not mentioned on the documentation
page:http://incubator.apache.org/stdcxx/doc/stdlibref/codecvt-byname.html
Let me look into adding it.
Thank you, I know how to handle this in Apache C++ Standard Library
now. I will try that.
Do you know the how can I use g++'s STL do this? I mean, conversion
between wchar_t*, which contain UCS-4 string, and char*, which contain
UCS-2 or UTF16 string.
The problem is raised when I try to do a project can be portable
between Windows and Linux. I try to write the unicode string to a
file.
When I choose UTF8 to write, I get 2 problems,
1) VC80's STL doesn't support UTF8's locale, (althought Win32 api
support it, but use win32 api will make some of the code non-portable)
2) All of the string is CJK characters, so UTF8 will cost at least 3
bytes to store, enlarge 50% for storage which is unnecessary if I
store just use UCS-2. And I'm sure all the characters is in BMP of
ISO-10646. So I'd better just use 16bit to store it in the file.
However, If I choose UCS2LE, just like what stored in wchar_t in VC, I
got problem of reading the file at Linux, which g++'s STL looks like
doesn't support UCS-2LE locale, and wchar_t in Linux is UCS4 rather
than UCS2, so I cannot directly read the content. (same kind of story,
since libiconv support UCS-2LE, but if I use libiconv it will make the
part of the code non-portable and I have to let mycode depends on
libiconv).
So, What should I do in this case?
VC++ and gcc. But they cost $.
P.J. Plauger
Dinkumware, Ltd.