Re: strings in C++
David Webber wrote:
"Giovanni Dicanio" <giovanni.dicanio@invalid.com> wrote in message
news:%2379$ux4uIHA.5472@TK2MSFTNGP06.phx.gbl...
...
I tend to save text out of application boundaries using Unicode UTF-8
(char's), ...
Why? [I am not criticising - just being curious!]
Reasons for me:
- Editable with any editor provided it only uses ASCII, still mostly
editable if non-ASCII bytes occur and the editor assumes an 8-bit codepage.
- Default encoding for XML, thus a de-facto standard for information
interchange.
- Using UTF-16 (MS Windows' internal representation) is not much easier,
because even there you need occasional surrogate pairs consisting of two
16-bit chars. Further, you need to convert anyway on other platforms.
- Endianess detection is a non-issue.
Someone asked how UTF-8 is detected. In general, as with UTF-16, there isn't
any way to do it reliably. However, writing a BOM typically works
(explained on wikipedia page, btw), just like with UTF-16 where it signals
the endianess in addition.
Uli
--
C++ FAQ: http://parashift.com/c++-faq-lite
Sator Laser GmbH
Gesch??ftsf??hrer: Thorsten F??cking, Amtsgericht Hamburg HR B62 932