Re: Character set
* Amit Kumar:
Hi Alf, Ron and Andy,
Thanks a lot for your valuable inputs.
[Ron]: Even US Windows is natively a 16-bit UNICODE machine.
[Andy]:If you are writing for Windows only I would advise you to use wchar_t
throughout
The 'W' varients of the windows APIs take UTF-16 encoded null
terminated strings and the 'A' varients require platform encoded null
terminated strings (and not the UTF-8 encoded strings; AFAIK)
The question arises: Can I really use wchar_t to store a UTF-16
encoded character
In Windows, yes provided you're limiting yourself to the Basic Multilingual
Plane, the "BMP", which essentially is the original 16-bit Unicode.
In Windows a wchar_t is 16 bits.
This is due to historical reasons (Microsoft was among the founders of the
Unicode Consortium, IIRC).
and std::wstring to store a UTF-16 encoded string?
Yes, and without the above mentioned limitation.
Stroustrup: "The size of wchar_t is implementation defined and large
enough to hold the largest character set support by the
implementation's locale."
Since it is not guaranteed that wchar_t is 16 bits,
In practice wchar_t is 16 bits or larger on any platform, and in Windows it's
exactly 16 bits.
I cannot simply
store a UTF-16 string in std::wstring and call .c_str() to obtain a
UTF16* for a Windows utf-16 based API.
Happily that's incorrect. :-)
However, note that Windows uses three different wide string representations:
ordinary zero-terminated strings, string buffers with separate length, and so
called B-strings (Basic language strings), where you have a pointer to the first
wchar_t following a string length field which as I recall is 16 bits. The
B-strings are created by SysAllocString & friends.
Microsoft's C++ compiler, Visual C++, supports B-strings and other Windows
specific types (including an intrusive smart pointer for COM objects) via some
run-time library types.
Even more frustrating and annoying thing is that I cannot even store a
utf-8 string in std::string.
Happily that's also incorrect.
Why? Because std::string is
std::basic_string<char>, and char is not guaranteed to be 8 bits
And happily :-), that's also incorrect. 'char' is indeed guaranteed to be at
least 8 bits. See the FAQ for that and other guarantees.
(though it is practically always 8 bits, as pointed out by Ron).
*Hark*. As far as I can see Ron did not make any such mistake.
Cheers & hth.,
- Alf
--
Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
No ads, and there is some C++ stuff! :-) Just going there is good. Linking
to it is even better! Thanks in advance!