Re: Understanding UNICODE
On Nov 22, 2:41 pm, Paavo Helde <myfirstn...@osa.pri.ee> wrote:
James Kanze <james.ka...@gmail.com> wrote in news:2a1ef89e-14dd-4c4d-bb68-
2f5f8dae9...@b15g2000yqd.googlegroups.com:
On Nov 21, 1:38 am, Paavo Helde <myfirstn...@osa.pri.ee> wrote:
As you mentioned Windows, it would be handy to know that
Windows does not support UTF-8 locales.
Really? I've used them under Windows, with no problems.
It seems I used the term "locale" too sloppyily. What I meant that
Windows has built-in support for 8-bit based (potentially multibyte)
encodings, which it calls "active codepage" (ACP). All SDK functions
having string parameters come in two variants - narrow and wide, and
narrow strings are automatically converted to wide strings (UTF-16) by
Windows SDK. It would be logical to set this 8-bit codepage setting to
UTF-8 and let Windows do all the translation transparently. However,
when I did an experiment and set the ACP and OEMCP values in the
registry to 65001 (UTF-8), the Windows did not boot up any more.
That's what I called "not supported".
I didn't fiddle with the registry, but in fact: after executing chcp
65001 in a command window, the output from a program which outputs C3
A9
74 C3 A9 0D 0A disappears completely---not even the 't' appears.
There's definitely something wrong there. (This was under Windows XP
professional. Perhaps under some later version...)
It appears you are considering more the file content encoding. This is
another issue, not directly related to OS, and C++ locales might
probably sometimes be helpful here.
I was considering principally file content encoding; that is really
the
only thing C++ locales address (with regards to encoding). I've had
no
problem reading and writing UTF-8 files under Windows, once the
correct
locale was installed. I've also had no problem using UTF-8
internally,
but reading and writing files with some different encoding, again
using
the appropriate locales. What locales don't (and can't) address, of
course, is what the system does with the bytes if they reach someplace
where the system interprets them (e.g. the display buffer of a console
window).
--
James Kanze