Re: How to read Unicode(Big-Endian) text file(s) in Non-MFC

From:

Ulrich Eckhardt <eckhardt@satorlaser.com>

Newsgroups:

microsoft.public.vc.language

Date:

Wed, 20 Feb 2008 10:13:27 +0100

Message-ID:

<oi7t85-odo.ln1@satorlaser.homedns.org>

meme wrote:

WORD GetBigWord(FILE *FilePtr)
{
    register WORD word;

    word = (WORD) (fgetc(FilePtr) & 0xff);
    word = ((WORD) (fgetc(FilePtr) & 0xff)) | (word << 0x08);

    return(word);
}

Sorry, but I can't help myself saying something about this code:
1. assert(FilePtr);
2. Forget about 'register', the compiler does a much better job allocating
registers to temporaries.
3. This completely fails when the file reaches EOF.
4. I would read two bytes from the stream (checking for errors, of course)
and then combine those two bytes to an integer.
5. Return is not a function, no brackets needed.
6. Initialise variables rather than declaring them and then assigning to
them.

wchar_t *data = new wchar_t[flen + 1];

Don't do this. In C++, use

std::vector<wchar_t> data(flen+1);

The reason is that you can't forget to manually invoke delete. Getting the
manual resource management right gets pretty difficult with multiple return
paths and exceptions.

while(!feof(file))
{
  bigEndianWord = GetBigWord(file);
  littleEndianWord = SwapWordEndiannes(bigEndianWord);

  data[i] = (wchar_t)littleEndianWord;
  i++;
}

This is broken by design. Always, when reading something, first perform the
read operations and then, before using the data, verify that reading
actually succeeded! If the size of the file is odd, you will happily read a
single byte and mix in EOF and interpret that as last character of your
text.

Further:
- Reading large amounts of data in small steps in inefficient.
- In C++, never use C-style casts.

Uli

--
C++ FAQ: http://parashift.com/c++-faq-lite

Sator Laser GmbH
Gesch??ftsf??hrer: Michael W??hrmann, Amtsgericht Hamburg HR B62 932