Re: converting char to float (reading binary data from file)

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Wed, 28 May 2008 13:06:32 -0700 (PDT)

Message-ID:

<9c6f0614-8823-4300-8abd-0aae09977e75@a70g2000hsh.googlegroups.com>

On May 28, 12:11 pm, gpderetta <gpdere...@gmail.com> wrote:

On May 28, 10:30 am, James Kanze <james.ka...@gmail.com> wrote:

On May 27, 12:07 pm, gpderetta <gpdere...@gmail.com> wrote:

On May 27, 10:04 am, James Kanze <james.ka...@gmail.com> wrote:

On May 26, 8:15 pm, c...@mailvault.com wrote:

On May 22, 1:58 am, James Kanze <james.ka...@gmail.com> wrote:

[...]

In Boost 1.35 they've added an optimization to take advantage of
contiguous collections of primitive data types. Here is a copy
of a file that is involved:

Note however:
[...]

// archives stored as native binary - this should be the fastest w=

// to archive the state of a group of obects. It makes no attempt=

// convert to any canonical form.
// IN GENERAL, ARCHIVES CREATED WITH THIS CLASS WILL NOT BE READAB=

// ON PLATFORM APART FROM THE ONE THEY ARE CREATE ON

Where "same platform" here means compiled on the same hardware,
using the same version of the same compiler, and the same
compiler options. If you ever recompile your executable with a
more recent version of the compiler, or with different options,
you may no longer be able to read the data.
In sum, it's an acceptable solution for temporary files within a
single run of the executable, but not for much else.

Modulo what is guaranteed by the compiler/platform ABI, I guess.

Supposing you can trust them to be stable:-). In actual
practice, I've seen plenty of size changes, and I've seen long
and the floating point types change their representation, just
between different versions of the compiler. Not to mention
changes in padding which, at least in some cases depend on
compiler options. (For that matter, on most of the machines I
use, the size of a long depends on compiler options. And is the
sort of option that someone is likely to change in the makefile,
because e.g. they suddenly have to deal with big files.)

The size of long or that of off_t?

No matter. The point is that they have to compile with
different options, and suddenly long has changed its size.

In particular, the Boost.Serialization binary format is
primarily used by Boost.MPI (which obviously is a wrapper
around MPI) for inter process communication. I think that
the idea is that the MPI layer will take care of
marshaling between peers and thus resolve any
representation difference. I think that in practice most
(but not all) MPI implementations just assume that peers
use the same layout format (i.e. same CPU/compiler/OS) and
just network copy bytes back and forward. In a sense the
distributed program is a logical single run of the same
program even if in practice are different processes
running on different machines, so your observation is
still valid

If the programs are not running on different machines,
what's the point of marshalling. Just put the objects in
shared memory. Marshalling is only necessary if the data is
to be used in a different place or time (networking or
persistency). And a different place or time means a
different machine (sooner or later, in the case of time).

Well, MPI programs runs on large clusters of, usually,
homogeneous machines, connected via LAN.

That's original. I don't think I've ever seen a cluster of
machines where every system in the cluster was identical. At
the very least, you'll have different versions of Sparc, or PC.
Some of which are 32 bit, and others 64. The cluster may start
out homogeneous, but one of the machines breaks down, and is
replaced with a newer model...

The real question, however, doesn't concern just the machines.
If all of the machines are running a single executable, loaded
from the same shared disk, it will probably work. If not, then
sooner or later, some of the machines will have different
compiles of the program, which may or may not be binary
compatible. In practice, the old rule always holds: identical
copies aren't. (Remember, binary compatibility can be lost just
by changing options, or using a newer version of the compiler.)

The same program will spawn multiple copies of itself on every
machine in the cluster, and every copy communicates via
message passing. So you have one logical program which is
partitioned on multiple machines. I guess that most MPI
implementations do not bother (in fact I do not even know if
it is required by the standard) to convert messages to a
machine agnostic format before sending it to another peer.

Well, I don't know much about that context. In my work, we have
a hetrogeneous network, with PC's under Windows as clients, and
either PC's under Linux or Sparcs under Solaris as servers (and
high level clients). And that more or less corresponds to what
I've seen elswhere as well.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34