Re: binary file parsing

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 12 May 2009 01:17:08 -0700 (PDT)

Message-ID:

<af4340cd-92ad-457b-88a1-08e68e3b46f2@r36g2000vbr.googlegroups.com>

On May 11, 11:33 pm, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:

On Mon, 4 May 2009 01:35:40 -0700 (PDT), James Kanze
<james.ka...@gmail.com> wrote:

...

In practice, you need to do two things: define the format you
will be reading, and decide your portability requirements.

Yes. I hate maintaining code where the external data format is
defined by "whatever the first implementation ended up
generating, on the first machine it happened to run on."

Especially when they've been around for a while. Some of those
older machines had some really wierd formats. (Byte order of a
long 2301, for example.)

If you need to handle float, and need to be portable to just
about everything, the code is far from trivial. If the
format you're reading used IEEE floats, however (often the
case), and you don't have to worry about machines which use
other floating point formats (mainly---perhaps
only---mainframes), then it is a lot easier.

It seems to me that XDR would be useful here -- it defines
storage of floating-point types.

It defines it to be IEEE. If your portability needs are
limited to machines with IEEE (PC's and most mid-sized Unix,
but not mainframes), then using IEEE in the external format is
particularly easy: just copy the bytes into an unsigned integer
of the same size, and output it as usual. If you need to
support other machines, however, you'll need to extract the
bits for each field, as they're defined by the format (which
defines them by reference to IEEE), and reassemble them using
things like ldexp.

Or better, don't insist on an unreadable data format; use
plain text instead.

That's always to be preferred. If only because it makes
debugging several orders of magnitude easier.

Note that if you want a portable format, even if it is a text
format, you'll still have to open the file as a binary file.
And output whatever the format requires for line endings. For
that matter, if you really want to be portable, you'll also have
to consider the possibility that your implementation of C++ uses
a different encoding internally than that required by the
format. Alternatively, you define the format as pure text, in
the native format for text, and require translation when moving
between systems. Historically, this is the traditional
solution, but it doesn't work very well if the machines are
physically connected or if you're sharing disks.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34