Re: strings in C++
"David Webber" <firstname.lastname@example.org> ha scritto nel messaggio
"Giovanni Dicanio" <email@example.com> wrote in message
I tend to save text out of application boundaries using Unicode UTF-8
Why? [I am not criticising - just being curious!]
I do that because it seems to me that Unicode UTF-8 is very useful (and kind
of "de facto" standard) for multiplatform communication of textual data. For
example, I think that XML default Unicode format is UTF-8. UTF-8 is widely
used on the Internet, in general.
Moreover, I like UTF-8 because there is no waste of memory for "normal"
ASCII characters (instead, with UTF-16, there is the null byte associated to
pad to 16 bits).
Another aspect I like about UTF-8 is that UTF-8 hasn't got the problem of
endiannes, i.e. UTF-8 is "just UTF-8" on every platform: Windows, Mac,
Instead, Unicode UTF-16 can be divided in two categories: UTF-16 LE and
UTF-16 BE, and you have to check the BOM (if present...) to understand which
particular endiannes the file you are reading is. In fact, I think it is
neither safe nor robust to assume that UTF-16 is always UTF-16 LE (the
default of Windows); there is also UTF-16 BE, which I think is used on Macs.
If I save (and load) the file (or textual data in general) using UTF-8, I
don't have this additional problem of platform endianness.
I use UTF-16 (with Windows endiannes) inside Windows applications because it
is the default Unicode format supported by Windows APIs (the <DoSomething>W
And I think that C# and .NET framework use the same approach by default:
they save textual data using UTF-8, and convert to UTF-16 (.NET String
class) when the text is used inside the application.
In fact I read there:
StreamWriter defaults to using an instance of UTF8Encoding unless specified