Re: Text File problem - VC++ MFC Studio 2008 MFC app
"Giovanni Dicanio" wrote:
<news:#6s9Zh#EJHA.4104@TK2MSFTNGP04.phx.gbl> jue, 11 sep
2008 08:26:08 GMT
Hi Tom,
"Tom Serface" <tom.nospam@camaswood.com> ha scritto nel messaggio
news:A416EB5B-6AC7-4370-A6EE-ADEF45CC74AB@microsoft.com...
In addition to what others have written, if you are using CStdioFile
you should use WriteString and ReadString. How did you look at the
file.
I still don't trust CStdioFile to write text to files...
I tried this simple MFC code snippet using VS2008, in Unicode mode:
<code>
CStdioFile outFile;
if ( ! outFile.Open( L"test.txt",
CFile::modeCreate | CFile::modeWrite | CFile::typeText ) )
{
AfxMessageBox( L"Error opening file" );
return;
}
outFile.WriteString( L"Ciao\n" );
outFile.WriteString( L"Poich?" );
</code>
Then I opened the file with Cygnus Free Edition in binary mode, and I
found that file bytes are (hex): 43 69 61 ... E9.
There are 12 bytes in total. That means that the text was not written
in Unicode UTF-16, because in UTF-16 there are 2 bytes for each
character. Moreover, there is no BOM (which should be required for
UTF-16, e.g. to identify if it is using UTF-16 LE or BE).
But this text is not Unicode UTF-8, either. In fact, the Italian '?'
of "poich?" is written as one single byte E9 in the file, but '?' is
not encoded as byte E9 in UTF-8.
------
Hi
According with CStdioFile in my vc6 src
void CStdioFile::WriteString(LPCTSTR lpsz)
{
ASSERT(lpsz != NULL);
ASSERT(m_pStream != NULL);
if (_fputts(lpsz, m_pStream) == _TEOF)
.........
WriteString uses _fputts, and copy-pasting from msdn
"Each of these functions copies string to the output stream at the
current position. fputws copies the wide-character argument string to
stream as a multibyte-character string or a wide-character string
according to whether stream is opened in text mode or binary mode,
respectively."
So you are writing Multibyte.
--------
So, I think that CStdioFile used some form of local code-page to write
text data to file, and using local code-pages is IMHO very bad. In
fact, if I give this file written on my computer with an
Italian/West-Europe code-page, to someone who has a different default
code-page (like Chinese, Japanese, etc.) I believe that the content of
the file will be seen as different (i.e. they will read no "poich?",
but something different from "?").
I think that Unicode is the way to go for international text
(CStdioFile may be good for pure-ASCII, i.e. only English characters),
and to me it seems that CStdioFile ignores Unicode.
The text should be written in some Unicode form; I prefer UTF-8, but
UTF-16 could be fine, too. And if UTF-16 is used, CStdioFile should
write a BOM, to specify if it is using UTF-16LE or UTF-16BE (in fact,
one of the advantages of UTF-8 is that no BOM is required to specify
the "endiannes" BE/LE - there are neither UTF-8 LE nor BE, there is
just UTF-8 :)
These are reasons why I don't use CStdioFile.
Maybe a better replacement would be CodeProject::CStdioFileEx
http://www.codeproject.com/KB/files/stdiofileex.aspx
or your Tom::CStdioFileEx...
The class I wrote is more restricted in scope (i.e. it writes only in
UTF-8), but I think that it does his (simple) job well :)
However, Mihai is the "king" in internationalization, so better wait
for him to have a definitive word about CStdioFile.
G