Re: Reading unicode text files

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

comp.lang.c++

Date:

Tue, 22 May 2007 09:46:31 +0200

Message-ID:

<5bflesF2qdd9oU1@mid.individual.net>

* Wx:

I'm trying to read a textfile written by the NTBackup utility on
Windows 2003 SBS. The problem is that when i print the output, it
looks like this:

S t a t o : b a c k u p
O p e r a z i o n e : b a c k u p
D e s t i n a z i o n e b a c k u p a t t i v o : F i l e
N o m e s u p p o r t o : " l u m e v e . b k f c r e a t o i
l 2 1 / 0 5 / 2 0 0 7 a l l e 2 3 . 0 0 "

As you can see, there is a space prior to any charater. I know that
unicode characters uses two bytes, so... can be the problem related to
different charset?

Yes. The "spaces" are, at least before they end up in your program,
zero bytes.

If I try to read a new textfile, there are no problem.

This is the relevant portion of the code:

try {
ifstream infile(strLogFile.c_str());

Well, it doesn't help you to use a wide character stream, because they
simply convert to/from external narrow character data.

What you can do is open the file in binary mode.

Then read the contents as binary data and treat as a sequence of wchar_t
values (e.g., you can just store them in a std::wstring).

Essentially this means implementing the machinery that the standard
library provides for narrow character streams. Or, you can buy an
existing implementation or find one on the net (I doubt you'll find
one). I think Dinkumware offers such an implementation.

Note that handling wchar_t in Windows leads you into compiler-specific
territory, since e.g. MingW g++ 3.4.4 doesn't support wide character
streams.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?