Re: Struct members -> Array elements

From:

"SuperKoko" <tabkannaz@yahoo.fr>

Newsgroups:

comp.std.c++

Date:

Thu, 25 May 2006 21:49:17 CST

Message-ID:

<1148563682.745972.221160@j33g2000cwa.googlegroups.com>

Greg Herlihy wrote:

?9.2/18 states: "There might therefore be unnamed padding within a
POD-struct object, but not at its beginning, as necessary to achieve
appropriate alignment."

The "as necessary" part certainly seems to require that any padding
separating members in a POD-struct has to be in fact necessary for the
purpose of attaining appropriate alignment of the struct's members.

Necessary?
I have counter examples. x86 CPU have no alignment requirements (I know
that's somewhat special).
But several popular compilers align 32 bits integers on 4 bytes
boundaries, doubles on 4 or 8 bytes boundaries, etc.
It is not necessary, but it is faster.
For character, aligning on 4 bytes boundaries is faster too!
But for arrays, it doesn't worth it, because it is an horrible speed
pessimization.
That's why, characters allocated on the stack, are often aligned on 4
bytes boundaries (which avoid read/write of the same memory word when
manipulating two different characters).
This can be a good reason to do so for structures, as Francis
Glassborow said:
Francis Glassborow wrote:

No that is not true. Let me give you an example:

struct X {
char a, b, c;

};

If you set your compiler switched to optimise for space it will usually
just add a byte of padding at the end. If you set it for speed, on some
platforms it might add three bytes of padding between each of a, b and
c. (Optimum alignment on 32 bit word systems). Nonetheless for an array
of char it has to take the no padding route.

On x86 CPU, when doing things such as x.a=x.b+x.c, the speed improvment
may be noticeable.

But we also know that such padding can never be needed to separate two
adjacent objects of the same type in memory. It makes no difference how
those two objects came to be adjacent - whether they are elements in
the same array, or members of the same struct or are adjacent simply by
sheer chance - makes no difference, because the alignment requirements
for a type are both constant and universal.

Constant and universal?
But, it depends on #pragma for some compilers!
For instance, with Borland C++ 5.0 (32 bits x86)

#pragma pack(2)
struct X {
  int d;
};
#pragma pack(4)
struct Y {
  char a;
  X b;
  int c;
};
the b field has offset 2 in the Y structure (thus the padding of int
doesn't seem to be 4 here).
the c field has offset 8 (and thus, there is 2 bytes of padding between
b and c).
The final structure is:
[a][ ][d][d][d][d][ ][ ][c][c][c][c]
Where [a] is a byte of Y::a, [ ] is a padding byte, [d] is a byte of
X::d and c is a byte of Y::c

objects of the same type need no padding to separate them when residing
in an array means that no two objects of the same type anywhere can
ever require padding between them.

X::d and Y::c is almost a counter example.

Tom1s wrote:

When I first read that, I mistakenly thought you meant that the compiler
may change it to the following behind your back:

struct X {

int a, b, c;

};

telling you that they are char's, when in actual fact they are actually
int's... so I was going to demonstrate how that could be a problem:-

No, like that:
struct X {
char a;char padding0[3];
char b;char padding1[3];
char c;char padding2[3];
};

(Out of curiosity: A char has no alignment requirements... how would
aligning it on a 4-byte boundary make it any faster?)

Read the IA-32 architecture manual.
x86 32 bits CPU are not really able to read a single byte at a time.
They take out from the cache, at least 4 bytes (aligned on 4 bytes
boundaries), and are able to extract the single byte from the 4 bytes
word.
Even when data is not aligned, you can read a 4 bytes word, but it
internally requires reading two 4 bytes word, and computing the value,
which requires CPU cycles (and is NOT necessary).
But, even for 1 byte data, it can be faster to have two bytes in two
different memory words, because it reduces read/write dependencies.
Thus, something like x.a=x.b+x.c will be faster (on pentium CPU for
instance) with padding bytes than without.

So are there actual implementations out there which may put padding
between members of the same type within a POD struct?

I am not aware of any (but I know only few compilers).
But I know that many compilers (including Borland C++) put padding
bytes between automatic variables on the stack:
For instance:
int f() {
char a,b,c,d; // assume that these local variables are not put in
registers.
}
Then a,b,c and d will not be contiguous... Each one will be stored in
its single 4 bytes word.
Note also that the calling conventions (__cdecl and __stdcall) on x86
CPU does require alignment on 4 bytes boundaries for each argument:
Thus
int f(char a, char b, char c);
Will have padding bytes... And will use 12 bytes of stack!

This can be a reason why compiler may want to do the same thing in
structs (and even document it).
Like that, the compilers may allow (in C code) to call a function with
a "wrong" prototype, replacing several parameters by structures having
the same layout.

With IA-32 there are two types of alignments :
The "required" alignment which is always 1
The "speed optimal" alignment which depends on the data type.
With SSE, it's even more complex : 32 bits floats arrays may be
"faster" when aligned on 128 bits boundaries.
Thus, it may be perfectly sensible for a compiler to put some padding
in this structure:
struct X{
  int x,y;
  float z;
  float arr[4];
};
Between X::z and X::arr[0], in order to align arr on 128 bits.

It may be sensible, for a compiler, to use one alignment, or the other,
depending on #pragmas given by the user...
It may even be possible that a user specify individually for each field
whether the field must be "fast" or "compact".
In that case:
struct X {
  char a;
  __compact int b; // this field will not be accessed often
  __fast int c; // this field will be accessed very often, thus the
programmer wants fast accesses.
};
would yield a structure where offsetof(X,b)==1, but offsetof(X,c)==8.
And thus, there would be some padding between two consecutive integers.

Sengbeom Kim wrote:

Suppose you have a struct with members named x, y, and z, and you could
want to refer to them sometimes by names, and sometimes by indices. The
former because it's more natural and closer to the problem domain, and
the latter because it's better suited for across-the-board operations
(and can even benefit from standard algorithms such as std::for_each,
std::transform, etc.). Without any guarantee from the standard, though,
you are forced to write something like:

There are many alternatives:
First, use an array, and provide accessors:
struct point {
  int coord[3];
  int& x() {return coord[0];}
  int& y() {return coord[1];}
  int& z() {return coord[2];}
  const int& x() const {return coord[0];}
  const int& y() const {return coord[1];}
  const int& z() const {return coord[2];}
};
Second, use an enumeration to give names to the coordinates:
struct point {
  enum Coordinate {x,y,z};
  int coord[3];
};

then p.coord[x] will identify the first coordinate.
This eumeration can be used everywhere where a "name of coordinate" is
expected.
And, for convenience, it may even be possible to overload
point::operator[](Coordinate)

Third, assuming that you may want the padding (for speed reasons) that
the compiler may put between fields, you don't want an array.
Then, you can still use pointer-to-members or the offsetof macro, and
put all the pointer-to-members in an array with static-storage
duration.

kanze wrote:

And who defines "as necessary"? According to what criteria? On
an IA-32, padding is not "necessary" in:
struct S { char c ; double d ; } ;
Every compiler I know inserts some padding between c and d,
however, at least by default.

Which doesn't mean that every compiler do. For instance Borland C++
puts no padding by default.
I *perfectly agree* with the argument, though.
I just said that to make clear that implementations are not all the
same.

John Nagle:

It's certainly common in networking code to assume the obvious
placement of structure elements. Is that assumption supported
by the standard, or not?

On x86 CPU there are at least two *different obvious* placement.
The no-padding placement (it is the easiest to use when you want a
specific scheme).
The padding-for-speed placement.
Since it may be very useful in networking, to have a very specific
layout, many popular compilers have a #pragma pack(1) or another way to
specify that structures have no padding.

There would be several big problems if the C++ comittee adpots this
proposal:
1) It would be necessary to convince the WG14 (C comittee) to do the
same modification to the C standard.
Otherwise this incompatibility would be a big problem when porting C++
code to C, or, simply, for C++ to C compilers.
2) It would introduce an incompatibility between C++ and *all* existing
C and C++ compilers.
Thus, for example, a C++0x compiler would not be able to have a
portable C++ to C compiler.
While, nowadays, it is possible to write a C++ to C compiler which
works with any C89 compiler.

wade@stoner.com wrote:

An ability to interface with other languages at a higher level.

It is the exact opposite.
Such binary interface specification is not the matter of the C++
standard.
layout specification are a good thing, but they depend on the
platform...
And often, when a compiler vendor wants to port his C++ compiler to
that platform, he wants to respect as far as possible, the layout
specifications of the platform.
That is *exactly why* the C++ comittee must not specify it...
Otherwise, it would forbid some implementations to respect the layout
of structures of some weird platform.
The base idea of C (and in some measure, C++) is that it abstracts all
differences & details of platforms.
That's why, C and C++ are extremely portable languages, at the cost of
"unspecified things".

Alternatively, the C++ comittee could specify the memory layout of
everything on *all existing platforms*.
But that would be stupid, because there are a huge (and growing) number
of platforms, and I don't see why every platform would accept that the
comittee impose them a memory layout that they don't like... Because
for example, they have currently another memory layout that they like.

Nothing says that C++ implementations are not allowed to specify the
memory layout of structures.
And many, many C++ implementations do specify such layouts.
Of course, if you want to write a very portable application (and it is
not always a concern), you can't use all platform-specific things.

wade@stoner.com wrote:

Its not the case that all-the-world is C++ (or even C++ with one little
island of C). When you want to interface to another language, you need
to have some way of describing your data layouts, or accepting the
other guy's data layout. Unfortunately, the most complex layout the
standard gives us is for array of small integers called unsigned char
(once you look up CHAR_BIT, and then make assumptions about bit-order
and byte order). In practice we tend to make additional assumptions
(it is a pretty good bet that the four-byte C++ float on your platform
is pretty much the same as the four-byte Fortran REAL on the same
platform, ...).

I currently use implementations which pack POD elements tightly, and
have found that to be a useful feature to exploit. I'm not yet
convinced that the feature is so useful that it should be standardized.
However, it does seem to be existing practice.

It IS often standardized.
http://www.google.com/search?hl=en&lr=&q=C+ABI+%22system+V%22&btnG=Search

And you can assume things on the memory layout, on EACH specific
platform.
But the point is that there are many platforms.
And on each platform, there is, a more or less standard, at least a
reference specification of the binary layout.
This reference specification depends on the platform.
There are platforms where you can assume that there is no padding
between chars... But there are perhaps platforms where you can assume
that there is padding between chars... And if there was no padding, the
compiler would not be "compliant" with the ABI of this platform.
However, the idea is that C++ programmers don't need to know all ABI...
Because C++ is a high-level language.
They only need to know the C++ standard, and can, more or less, assume
that all languages, on a specific platform, interact gracefully.
C++ implementers have to papers to read:
The C++ standard.
And the reference ABI of the target platform.

Languages interactions are platform-specific, and thus, is a matter of
compiler implementers/platform-specific comittees.
wade@stoner.com wrote:

2) Those literal integers are a maintenance issue. Use an enum
instead, and you've got a different set of maintenance issues.

But your code has terrible maintenance issues....
inverting z and y would completely change the structure...
Seriously, I would not want to have to maintain such terrible thing.

Seungbeom Kim wrote:

Again, I'm referring to the fact. And I assumed that many library
implementations used two named members for the real and the imaginary
parts but that still had the proposed layout. At least, the GNU
Standard C++ Library v3 does that. If there is an implementation that
uses two named members and thus fails to have the proposed layout, I
will be interested to hear about it.

Compilers implementers do know what the layout of their structures is.
And it is very probable that many implementations use a memory layout
where it works.
For instance GNU libstdc++ respects a specific ABI.

Seungbeom Kim:

Yes, compatibility is important; not only in complex but probably also
in other data structures.

You fail to understand that different platforms use different layouts
It is impossible to give a memory layout which would work (efficiently)
on all platforms.
Each platform must specify its memory layout.

For instance, on IA-32, the calling convention is not a big deal when
communicating between Fortran and C++, because there are only a few
well-defined calling conventions : Mainly __cdecl and __stdcall.
Do you think that the C++ standard should specify these calling
conventions?
At least I hope that the C++ standard will never specify that in:
void f(char c1, char c2);

(&c2)==((&c1)+1)
Because it would invalidate the binary compatibility of C++0x compilers
with C++98/C99/Fortran/Ada/and_others compilers on this platform.

4zumanaga@gmail.com

Outside of unions, you obviously don't want to be able to access these
classes through each other, as it would hurt optimising a lot, as many
more things would alias with each other.

I have thought about that statement, and know I think that the compiler
can't really do no-alias optimizations (except if he really puts
padding between elements).

struct X {
  double x,y,z;
};
int main() {
  X k:
  double *p=&k.x; // ok
  if (p+1 /* ok : pointer one past the end of an array is valid (as
described in 6.5.6p7 and p8) */ == &k.y) {
    /* in this code, p[1] is an alias of y */
    /* Moreover, p+2 is valid here (one past the end of y) */
    if (p+2 == &k.z) {
      /* now p[2] is an alias of k.z */
    }
  }
}

So, I don't think that compilers are able to do no-alias optimizations.
However, from a standard point-of-view, it is forbidden to do that
except if:
1) The implementation effectively documents this layout.
2) OR, the code explicitly tests it with operator==
Otherwise, an hypotetical implementation may do a no-alias optimization
(even if it is practically impossible).

Tom1s wrote:

But would it not have to use 16/32 Bit int's? If so, I demonstrated in
another post how this would cause problems.

No, it can just use the IA-32 byte move instruction such as "mov al,
byte ptr [esi]"
Pentium & higher CPU, are not internally able to work on bytes... And
they move a whole word.
Thus, having each byte in a separate word reduce the number of
read/write dependencies.

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]