Re: converting from windows wchar_t to linux wchar_t
my Q is : is there a simple way to convert a 2 bytes wchar_t (windows
version ) to 4 bytes wchar_t ( linux version ).
wchar_t is a particularly useless type : Because its implementation =
defined it doesn't have (in protable code) any kind of assurance of what =
type of character encoding it may be using or capable of using.
The next point is that *unicode* characters are unsigned. so use an =
unsigned short for your UCS-2 / UTF-16 representation. =
http://en.wikipedia.org/wiki/UTF-16 has loads more information.
Finally, conversion for simple UCS-2 to UTF-32 is simple... Simply pad =
out the data by doing a direct characterwise copy:
typedef ucs2char unsigned short;
typedef utf32char unsigned long;
void convert_ucs2_2_utf32(ucs2char const* src; utf32char* dest)
{
do {
*dest++ = *src;
} while(*src++);
}
If you want to properly convert characters outside the basic =
multilingual plane, and the B.M.P covers all displayable characters from =
all modern languages that are in use :- european and eastern - then you =
need to be aware of surrogate pairs: Unicode codepoints in the range =
U+D800-U+DFFF are not assigned to valid characters, this range is used =
by UTF-16 to encode pairs of UTF-16 character each of which encodes 10 =
bits of the final codepoint.
So, something like this will do the translation of UTF-16 to UTF-32
typedef utf16char unsigned short;
void convert_utf16_to_utf32(ucs2char const* src; utf32char* dest)
{
do {
if(*src & 0xD800 == 0xD800) {
*dest++ = (*src++ & 0x07ff) << 10 + (*src & 0x7ff) + 0x10000;
} else
*dest++ = *src;
} while(*src++);
}