Re: unicode
On Tue, 26 Jun 2007 08:04:44 -0700, jraul <jraulinth@yahoo.com> wrote:
Why does the following program create an empty text file (0 bytes).
Someone in the C++ group mentioned code pages, but I thought unicode
was made to replace code pages, and I thought Windows XP used unicode.
Hi,
I do prefer writing Unicode data to files using Unicode UTF-8 encoding
(for several reasons, some of them I wrote in previous posts, like
e.g. not having endiannes problems, etc.)
Your code did not work in my test, too.
But, using the UTF-8 approach, everything is fine, and I can open the
document from both Windows NotePad and Word (note that no BOM mark is
required for UTF-8) and see the three funny symbols :) : the pirate
skull symbol, the Communist symbol, and the yin-yang symbol, is this
correct?
Here's my code:
<CODE>
// Unicode string (UTF-16)
std::wstring s = L"\u2620\u262D\u262F\n";
// Convert from UTF-16 to UTF-8
CW2U utf8String( s.c_str() );
// Store UTF-8 string in std::string
std::string outputString( static_cast<const char *>( utf8String ) );
// Write Unicode UTF-8 string to file
std::ofstream fout_utf8("c:\\data_utf8.txt");
if ( fout_utf8 )
{
fout_utf8 << outputString << std::endl;
fout_utf8.close();
}
</CODE>
The 'CW2U' is an helper template class I developed (like the ATL
string conversion helpers), here it is its source code:
<CODE>
//----------------------------------------------------------------------------
// Class: CW2UEX
// Descr: Convert from Unicode UTF-16 (WideChars) to Unicode UTF-8
//----------------------------------------------------------------------------
template< int t_nBufferLength = 128 >
class CW2UEX
{
public:
CW2UEX( LPCWSTR psz ) throw(...) :
m_psz( m_szBuffer )
{
Init( psz );
}
~CW2UEX() throw()
{
if( m_psz != m_szBuffer )
{
free( m_psz );
}
}
operator LPSTR() const throw()
{
return( m_psz );
}
private:
void Init( LPCWSTR psz ) throw(...)
{
if (psz == NULL)
{
m_psz = NULL;
return;
}
int nLengthW = lstrlenW( psz )+1;
// One Unicode UTF-16 character could be converted
// up to 4 UTF-8 characters
int nLengthUtf8 = nLengthW * 4;
if( nLengthUtf8 > t_nBufferLength )
{
m_psz = static_cast< LPSTR >( malloc( nLengthUtf8*
sizeof( char ) ) );
if (m_psz == NULL)
{
AtlThrow( E_OUTOFMEMORY );
}
}
if (::WideCharToMultiByte( CP_UTF8, 0, psz, nLengthW,
m_psz, nLengthUtf8, NULL, NULL ) == 0)
{
AtlThrowLastWin32();
}
}
public:
LPSTR m_psz;
char m_szBuffer[t_nBufferLength];
private:
CW2UEX( const CW2UEX& ) throw();
CW2UEX& operator=( const CW2UEX& ) throw();
};
typedef CW2UEX<> CW2U;
</CODE>
MrAsm