Re: portable handling of binary data

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Wed, 17 Dec 2008 05:24:58 -0800 (PST)

Message-ID:

<e548bae7-33b3-4afb-9212-478b3aae5eae@a37g2000pre.googlegroups.com>

On Dec 17, 11:06 am, SG <s.gesem...@gmail.com> wrote:

On 17 Dez., 00:00, James Kanze <james.ka...@gmail.com> wrote:

Conversions to unsigned integral types are fully defined.

Right. The remaining issue would be how bits are interpreted
for the value of signed char. That's why you recommended raw
access as sequence of unsigned chars.

Yes.

In practice, if the machine is 2's complement, you should be
able to type pun (i.e. reinterpret_cast on a pointer) plain
chars and unsigned chars without problems. But why bother.

Not really. First, you do the input as unsigned:

uint16_t result = source.get() ;
result |= source.get() << 8 ;

I intend to use istream::read which requires a pointer to
char.

All "raw IO" in C++ is defined in terms of char. But I don't
really see any advantage of read() over using istream::get(), as
above, and I see several (very minor) disadvantages.

I checked the current C++ specification draft again and it
seems that I'm allowed to cast a pointer to void* and then to
unsigned char* to access the raw data.

You can just use reinterpret_cast. You're type punning, and
that's what reinterpret_cast was designed for.

So, I expect the following to work in case CHAR_BIT == 8:

  /// extract unsigned 16 bit int (little endian format)
  inline uint_fast16_t get_u16le(const void* pv) {
    const unsigned char* pc = static_cast<const unsigned char*>(pv);
    return pc[0] | (pc[1] << 8);
  }

  char buff[123];
  ifstream ifs ("somefile.dat", ifstream::binary | ifstream::in);
  ifs.read(buff,123);
  uint_fast16_t foo = get_u16le(buff);

Yes. Provided your protocol requires 123 bytes to be available.

Unless I knew I had to support a machine where it didn't work,
I'd just assign the results to an int16_t and be done with it.
[...] Most people, I suspect, count on the conversion of
the uint16_t to int16_t to do the right thing, although
formally, it's implementation defined (and may result in a
signal).

Is there an elegant way for querying this
implementation-defined behaviour at compile-time so I can make
the compiler reject the code if it won't work like intended?

Not really, if int16_t is present. About all I can suggest is a
small program which actually tries it, and outputs the results,
compiled and executed from a script which generates a #define of
something with the appropriate value, invoked automatically from
your makefile. In practice, however, I probably wouldn't
bother. The unit tests will fail in an obvious way if the
compiler doesn't do the expected, at which point you can add
whatever you need to your configuration file.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34