Re: strings in C++

From:

"Giovanni Dicanio" <giovanni.dicanio@invalid.com>

Newsgroups:

microsoft.public.vc.language

Date:

Thu, 22 May 2008 10:21:42 +0200

Message-ID:

<ey#NhU#uIHA.5288@TK2MSFTNGP06.phx.gbl>

"David Webber" <dave@musical-dot-demon-dot-co.uk> ha scritto nel messaggio
news:O3i$Tg5uIHA.4848@TK2MSFTNGP05.phx.gbl...

"Giovanni Dicanio" <giovanni.dicanio@invalid.com> wrote in message
news:%2379$ux4uIHA.5472@TK2MSFTNGP06.phx.gbl...

...
I tend to save text out of application boundaries using Unicode UTF-8
(char's), ...

Why? [I am not criticising - just being curious!]

Hi David,

I do that because it seems to me that Unicode UTF-8 is very useful (and kind
of "de facto" standard) for multiplatform communication of textual data. For
example, I think that XML default Unicode format is UTF-8. UTF-8 is widely
used on the Internet, in general.

Moreover, I like UTF-8 because there is no waste of memory for "normal"
ASCII characters (instead, with UTF-16, there is the null byte associated to
pad to 16 bits).

Another aspect I like about UTF-8 is that UTF-8 hasn't got the problem of
endiannes, i.e. UTF-8 is "just UTF-8" on every platform: Windows, Mac,
Linux, etc.
Instead, Unicode UTF-16 can be divided in two categories: UTF-16 LE and
UTF-16 BE, and you have to check the BOM (if present...) to understand which
particular endiannes the file you are reading is. In fact, I think it is
neither safe nor robust to assume that UTF-16 is always UTF-16 LE (the
default of Windows); there is also UTF-16 BE, which I think is used on Macs.
If I save (and load) the file (or textual data in general) using UTF-8, I
don't have this additional problem of platform endianness.

I use UTF-16 (with Windows endiannes) inside Windows applications because it
is the default Unicode format supported by Windows APIs (the <DoSomething>W
ones).

And I think that C# and .NET framework use the same approach by default:
they save textual data using UTF-8, and convert to UTF-16 (.NET String
class) when the text is used inside the application.
In fact I read there:

http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx

<cite>
StreamWriter defaults to using an instance of UTF8Encoding unless specified
otherwise. [...]
</cite>

Giovanni