Re: File-Reading Best Practices?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sat, 3 Apr 2010 13:53:41 -0700 (PDT)

Message-ID:

<680d9141-aa3b-4199-804d-829145551373@z4g2000yqa.googlegroups.com>

On Apr 3, 10:32 am, Andreas Wenzke <andreas.wen...@gmx.de> wrote:

I want to parse an XML file manually (but my question would be
the same for any other file format):
What are best-practice guidelines for doing that?

I currently use a char buffer in conjunction with
istream::read and then walk through the buffer step by step.
However, problems will arise when tags span across the buffer,
i.e. when the buffer contains "<h" at the end and the next
characters to be read from the stream are "tml>". I'm
considering using memmove, but I just think there has to be a
better option.

As this is for a university project, I'm not allowed to use
the STL (std::string and so on).

The most obvious solution is to ensure that the buffer never
does end in the middle of a token. Say by using getline to read
it. This has the additional advantage of making it trivial to
output the line number in error messages. In the case of real
XML, it's probably not a good idea, since WWW requires
recognizing several different line ending conventions (although
it wouldn't be that difficult to write a custom getline which
recognized them all), but I doubt that that's relevant for a
school project (at least at a level where you aren't allowed to
use the STL).

Another solution is to read character by character, using a
state machine to determine where the token ends, and put each
character into your final buffer. In this way, you never have
more than one token in the buffer, and the buffering in filebuf
least at a level where you aren't allowed to use the STL).

Another solution is to read character by character, using a
state machine to determine where the token ends, and put each
character into your final buffer. In this way, you never have
more than one token in the buffer, and the buffering in filebuf
takes care of the least at a level where you aren't allowed to
use the STL).

Another solution is to read character by character, using a
state machine to determine where the token ends, and put each
character into your final buffer. In this way, you never have
more than one token in the buffer, and filebuf takes care of the
actual IO buffering.

--
James Kanze