Re: Memory pools and alignment

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 13 Oct 2009 00:54:33 -0700 (PDT)

Message-ID:

<771d8e35-c9f5-4f6e-a44e-4a927c62c0ad@l13g2000yqb.googlegroups.com>

On Oct 12, 10:37 pm, "Francesco S. Carta" <entul...@gmail.com> wrote:

after having been taught that nifty new thing about overriding
"operator new" and "operator delete", I've started messing
with them and with custom allocations.

Now I have some kind of memory pool which allocates a big
chunk of memory via "new char[pool_size]" and some functions
which return sub- chunks of it via void*.

For the moment I prefer to skip about the details - I will
post the implementation later in order to straighten out
something more, with your help - if I'll feel brave enough to
actually post it.

So then, I'm testing it in this way:
-------
    char* pchar = static_cast<char*>(Arena::allocate(sizeof(char)*3));
    short* pshort = static_cast<short*>(Arena::allocate(sizeof(short)
*3));
    int* pint = static_cast<int*>(Arena::allocate(sizeof(int)*3));
    for(int i = 0; i < 3; ++i) {
        pchar[i] = i+1;
        pshort[i] = i+1;
        pint[i] = i+1;
    }
-------

A straight hex-dump of the pool gives me this:
-------
010203010002000300010000000200000003000000adbeefdeadbeef...
-------
Where the "deadbeef" pattern highlights the unused part of the pool.

Well that made me wonder. I'd have expected to hit some memory
alignment problem, instead I am getting a nicely packed and
(so far) working pool... how's that?

All issues concerning alignment are implementation defined (or
are they simply unspecified?); all the language specification
says is that there may be some alignment restrictions.
Depending on the hardware/OS, different situations occur:

-- There are no alignment restrictions what so ever. This is
    the case, for example, on 8 bit processors (are there any of
    those left), like the 8080 or the Z80. This is also the
    case on some embedded processors, which make char 32 bits
    (or whatever), and sizeof(int) == 1.

-- The hardware handles alignment problems transparently.
    Typically, in this case, misaligned data may result in the
    code running slightly slower, but even this isn't certain;
    if the misaligned data still fits in a cache or a bus line,
    it's likely not even to cause a slowdown, and of course, the
    read or write pipeline might absorb any slowdown as well,
    depending on what else you're doing in the code.

    The most notable processor in this case is the Intel
    x86 architecture. Otherwise, I think it's pretty rare.
    (The old NSC 32xxx chips supported it, and I think some
    versions of the Motorola 68000. Neither are in use today,
    however.)

-- The hardware traps on a misaligned access, but the OS
    catches the trap and simulates the access. Misaligned data
    results in a very significant slowdown.

    I mention this as a possibility; I don't know of any system
    off hand where it is the case.

    The hardware traps on a misaligned access, and this trap is
    mapped to a signal, or terminates the process, or something
    along those lines. This is by far the most common
    situation: Sparc, Power PC, HP's PA risc chips and probably
    the Itanium, IBM mainframes and the old RS/6000, and in the
    distant past, PDP-11 and Interdata 8/32. (Come to think of
    it, I seem to recall reading somewhere that the Vax
    supported misaligned accesses, like the x86. The last DEC
    machine I actually worked on, however, was a PDP-11, so I'm
    not sure.)

Since in other postings, you've indicated that your a hobby
programmer, not a professional, it seems very, very likely that
you're running on an Intel based machine (or on an AMD
compatible chip), so you won't immediately see any effect of
misaligned data---in small examples like your test code, it's
likely that you won't even see any difference in speed.

My implementation aligns types to their size boundaries when laying
out classes, for example:
-------
struct foo {
    bool b;
    int i;
    char c;
    long long ll;
    foo() : b(1), i(2), c(3), ll(4) {}};

-------

Dumping the memory of a foo instance I get the following:
-------
01adbeef0200000003adbeefdeadbeef0400000000000000
-------

Where the "deadbeef" pattern highlights the padding.

Why these two different behaviors?

The implementation is allowed to pad pretty much however it
wants. I think it's even "unspecified", rather than
"implementation defined", so the implementation is not even
required to document it (but I could be wrong about that). Most
Intel implementations do pad for alignment (with some variance
concerning long double: 10, 12 or 16 bytes total, depending on
who wrote the compiler, with the additional bytes often being
part of the type, rather than padding).

Am I simply lucky getting my pool so nicely packed?

Honestly, I somehow hoped for problems to show up... but they
didn't yet.

The problem is that if your implementation "requires" alignment,
but you neglect it, the results are undefined behavior, so you
may or may not see an effect. Your code will crash on a Sparc
(bus error) or most modern non-Intel general purpose computers,
but will run without problems on an Intel.

--
James Kanze