Re: Want Input boxes to accept unicode strings on Standard Window
David Ching wrote:
Ah, UTF-8. I know you discussed this at length several months ago here, but
to be honest, this is my understanding of it: it is an 8-bit encoding
scheme no different than Ansi (that's how it fits in 8 bits). Since it is
8-bits, it cannot specify everything a LPWSTR can. Yet it is somehow is
supposed to be better than Ansi, not reliant on any codepage. But if it's
only 8 bits, how is that?
And UTF-8 begs the question about UTF-16. Is UTF-16 the same as what
Windows Notepad (in the Save As dialog) calls "Unicode"? Or is Windows
concept of Unicode and LPWSTR different than UTF-16?
David:
Both UTF-8 and UTF-16 are complete encodings of Unicode. UTF-8 uses up
to four 8-bit characters, and UTF-16 uses up to two 16-bit characters.
When "Windows Unicode" first started out, all code points could be
represented by one 16-bit code unit, but no longer. Modern Windows
Unicode *is* UTF-16. The Windows ANSI code pages are (I think) all DBCS,
so UTF-8 cannot be used as a code page (at any rate, it is not the ANSI
code page for any language).
Some say, and I agree, that now there are surrogate pairs in UTF-16, it
holds no advantage over UTF-8. Many Linux systems use UTF-8 as their
native encoding, but this will never happen in Windows.
This does not mean that a Windows program cannot use UTF-8 internally.
In fact the whole back end of my application uses UTF-8. XML
serialization is just one of the things this back end does.
--
David Wilkinson
Visual C++ MVP