Re: transforming array of unsigned chars or BYTEs to an int using pointers

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.lang.c++.moderated

Date:

13 Jul 2006 17:10:05 -0400

Message-ID:

<1152782479.213254.95020@h48g2000cwc.googlegroups.com>

Frederick Gotham wrote:

posted:

I am trying to write some glue code to parse 4 raw bytes
into an integer value.

Caveat number 1: Watch out for padding within integer types.

In theory, yes. In practice, this is probably the least likely
problem you will encounter. The most likely is byte order and
size---both vary a lot over everyday machines, and the second
most likely is the actual representation; at least one machine
still being sold uses 36 bit 1's complement integers, where as
the only machine I knew which had padding when out of production
some years ago. (It also used signed magnitude, and 48 bit
ints, which meant that you also had to solve those problems as
well.)

In practice, you really don't care about the internal format of
an int, as long as it is large enough to contain all of the
values you're interested in. The format specification of the
input data should indicate how each byte (or even each bit, in
extreme cases) affects the value; your program manipulates the
value (not the bit image) of the internal int to produce the
correct final value.

If I were to do what you are doing, I'd probably write the
code something like:

#include <iostream>
#include <limits>
#include <climits>

#define SomeKindOfCompileTimeAssert(expr) typedef char Compass[(expr)?4:-4]

unsigned Amalg(unsigned char const * const p)
{
    /* Firstly, ensure no padding: */
    SomeKindOfCompileTimeAssert(
        CHAR_BIT * sizeof(unsigned)
        == std::numeric_limits<unsigned>::digits
    );
    return reinterpret_cast<unsigned const&>(*p);

I'm not sure I understand this line; it looks like a recepe for
a core dump on my machine. According to the standard, this is
the equivalent of:

return *reinterpret_cast< unsigned const* >( p ) ;

(which seems the clearer and more natural way of writing it to
me). Dereferencing the result of the reinterpret_cast, however,
is undefined behavior, and in fact, doesn't work on most of the
architectures I've used (including the Sun Sparc on which I work
today): with the exception of Intel, all of the architectures
I've used have alignment restrictions.

In addition, it seems to totally ignore any specification as to
the format of the four bytes. Admittedly, the original poster
didn't provide any such format specification (and a simple
"return 0" would also have met all of the requirements he
specified), but it's hard to imagine any useful application of
this where there wasn't some format specification.

}

int main()
{
    unsigned char array[sizeof(unsigned)] = { 1, 2, 3, 4 };
        /* Assuming 4 bytes per int */
    unsigned i = Amalg(array);
    std::cout << i;
}

If Endianness is an issue, then things get a little muddier
(I've actually got half-written code somewhere on my harddisk
for doing this).

If you don't know the format, you can't convert, that's for
sure. But endianness is only part of the format.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]