I agree. A BOM should be required and is even specified by Microsoft.
Hi Tom,
"Tom Serface" <tom.nospam@camaswood.com> ha scritto nel messaggio
news:A416EB5B-6AC7-4370-A6EE-ADEF45CC74AB@microsoft.com...
In addition to what others have written, if you are using CStdioFile you
should use WriteString and ReadString. How did you look at the file.
I still don't trust CStdioFile to write text to files...
I tried this simple MFC code snippet using VS2008, in Unicode mode:
<code>
CStdioFile outFile;
if ( ! outFile.Open( L"test.txt",
CFile::modeCreate | CFile::modeWrite | CFile::typeText ) )
{
AfxMessageBox( L"Error opening file" );
return;
}
outFile.WriteString( L"Ciao\n" );
outFile.WriteString( L"Poich?" );
</code>
Then I opened the file with Cygnus Free Edition in binary mode, and I
found that file bytes are (hex): 43 69 61 ... E9.
There are 12 bytes in total. That means that the text was not written in
Unicode UTF-16, because in UTF-16 there are 2 bytes for each character.
Moreover, there is no BOM (which should be required for UTF-16, e.g. to
identify if it is using UTF-16 LE or BE).
But this text is not Unicode UTF-8, either. In fact, the Italian '?' of
"poich?" is written as one single byte E9 in the file, but '?' is not
encoded as byte E9 in UTF-8.
So, I think that CStdioFile used some form of local code-page to write
text data to file, and using local code-pages is IMHO very bad. In fact,
if I give this file written on my computer with an Italian/West-Europe
code-page, to someone who has a different default code-page (like Chinese,
Japanese, etc.) I believe that the content of the file will be seen as
different (i.e. they will read no "poich?", but something different from
"?").
I think that Unicode is the way to go for international text (CStdioFile
may be good for pure-ASCII, i.e. only English characters), and to me it
seems that CStdioFile ignores Unicode.
The text should be written in some Unicode form; I prefer UTF-8, but
UTF-16 could be fine, too. And if UTF-16 is used, CStdioFile should write
a BOM, to specify if it is using UTF-16LE or UTF-16BE (in fact, one of the
advantages of UTF-8 is that no BOM is required to specify the "endiannes"
BE/LE - there are neither UTF-8 LE nor BE, there is just UTF-8 :)
These are reasons why I don't use CStdioFile.
Maybe a better replacement would be CodeProject::CStdioFileEx
http://www.codeproject.com/KB/files/stdiofileex.aspx
or your Tom::CStdioFileEx...
The class I wrote is more restricted in scope (i.e. it writes only in
UTF-8), but I think that it does his (simple) job well :)
However, Mihai is the "king" in internationalization, so better wait for
him to have a definitive word about CStdioFile.
G