Re: How to read Unicode(Big-Endian) text file(s) in Non-MFC
Giovanni Dicanio wrote:
"meme" <meme@myself.com> ha scritto nel messaggio
news:eJlvmRicIHA.4844@TK2MSFTNGP04.phx.gbl...
I'm trying to read unicode text files.... so far I'm able to do
following....but lost in "Big-Endian" thingies...
Reading MSDN documentation about fopen, it seems that it can handle
Unicode UTF-16 LE, but not BE.
http://msdn2.microsoft.com/en-us/library/yeby3zcb.aspx
So, I think you should just read the raw WORDs (16 bits, two bytes)
from file, and swap the byte order from your code.
1. For each WORD in file
2. read that WORD
3. swap low-byte and high-byte, transforming the WORD from BE to LE
4. store this LE word (Unicode UTF-16LE wchar_t) in memory
To swap two bytes in a word, you may use the following code:
Why roll your own when there's _swab (prototype in stdlib.h)?
"If n is even, the _swab function copies n bytes from src, swaps each pair
of adjacent bytes, and stores the result at dest. If n is odd, _swab copies
and swaps the first n-1 bytes of src. _swab is typically used to prepare
binary data for transfer to a machine that uses a different byte order."
<code>
// Converts a word from Big-Endian to Little-Endian (or vice-versa)
inline WORD SwapWordEndiannes(WORD w)
{
// Swap low and high bytes
return MAKEWORD( HIBYTE(w), LOBYTE(w) );
}
WORD bigEndianWord = ...;
WORD littleEndianWord = SwapWordEndiannes(bigEndianWord);
</code>
HTH,
Giovanni