Re: STL, UTF8, and CodeCvt
Clark Cox wrote:
On 2007-03-05 07:54:21 -0800, "Eugene Gershnik" <gershnik@hotmail.com>
said:
True. Another great feature is that UTF-8 is backward compatible with
ASCII as far as search operations are concerned. That is strchr() or
manual iteration will work as long as you search for something withing
the ASCII range.
Not entirely true. If I search for the character 'e' in the string
"acut?", it is equally possible that the character will be found as it
is that it won't. When encoding the above string in UTF-8, there are
two possibilities (due to decomposition):
Just a small clarification: that's a consequence of Unicode, not
specifically UTF-8. The same thing occurs with any encoding of Unicode
characters. There are two different ways of writing that final letter.
It can be written with a single code point 0x00E1 (LATIN SMALL LETTER A
WITH ACUTE), and it can be written as two code points, 0x0061 (LATIN
SMALL LETTER A) followed by 0x0301 (COMBINING ACUTE ACCENT).
--
-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]