Re: Split a numeric value into bytes (char)

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Fri, 23 Jan 2009 02:23:17 -0800 (PST)

Message-ID:

<7c8de143-2deb-4aeb-ad7e-dab5caecc89d@i18g2000prf.googlegroups.com>

On Jan 22, 4:24 pm, Michael DOUBEZ <michael.dou...@free.fr> wrote:

Jean-Marc Bourguet wrote:

James Kanze <james.ka...@gmail.com> writes:

C++ even extended it to encompass accessing it as an array
of char (which more or less means that plain char must be
unsigned if the machine is not 2's complement). But even
something like:

    union
    {
        char c1 ;
        char c2 ;
    } u ;
    u.c1 = 'a' ;
    putchar( u.c2 ) ;

is undefined behavior---the standard allows the
implementation to somehow maintain the information as to
what the last stored value was (except in some very special
cases), and core dump if you access via any other member.

(That was, at least, the concensus in the C committee, back
in the late 1980's. At least among some of the committee
members.)

The funny thing is that

struct s1 { char c; };
struct s2 { char c; };

union {
s1 m1;
s2 m2;
} u;

u.m1.c = 'a';
putchar(u.m2.c);

is conformant...

Yes. Supposedly, in some circles, something along these lines
was used to simulate polymorphism. The "initial sequence" was
defined by a macro, and included a type tag, and the union was a
polymorphic object---whose type could even be changed
dynamically. (The solution I've usually seen was:

    struct h { int tag ; /*...*/ } ;
    struct s1 { struct h head; ... } ;
    struct s2 { struct h head; ... } ;

then pass h* around, casting them to s1* or s2* as needed.)

And the example of =A79.5/2:
[Example:
void f()
{
union { int a; char* p; };
a = 1;
// ...
p = "Jennifer";
// ...}

Here a and p are used like ordinary (nonmember) variables, but
since they are union members they have the same address.]

The address is the same because of =A79.5/1 "[...]Each data
member is allocated as if it were the sole member of a
struct.[...]"

So there is a guarantee that the layout is the same for an
union of 2 chars. The question is whether or not the compiler
synchronized the data at the data's address such that it is
available through the second member.

We could believe it is so because of the guarantee that
"[...]If a POD-union contains several POD-structs that share a
common initial sequence, and if an object of this POD-union
type contains one of the POD-structs, it is permitted to
inspect the common initial sequence of any of POD-struct
members[...]". Therefore there should be synchronization.

Unless the compiler can determine that the union doesn't
contain such POD struct with relevant initial sequence and
inhibit the synchronization of the memory.

But since you can't put incomplete types in a union, it has this
information.

The issue is more than a little complicated, because on one
hand, we want cleanly written code to work, without particular
precautions, and on the other, we want to allow a maximum of
optimizing. In the end: if you're using a union to hold
different types at different times (as guaranteed by the
standard), it will in practice work if the union is visible to
the compiler where ever the data are accessed. And for type
punning, you'll really have to check what you're doing for each
compiler (and maybe pass special flags or turn off some
optimizations).

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34