Re: Calling destructor on fundamental types and other stuff about placement new

From:

"Francesco S. Carta" <entuland@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sun, 29 Aug 2010 22:07:38 +0200

Message-ID:

<4c7abe09$0$30897$5fc30a8@news.tiscali.it>

Gennaro Prota <gennaro.prota@yahoo.com>, on 29/08/2010 21:01:19, wrote:

On 27/08/2010 11.26, Francesco S. Carta wrote:

Gennaro Prota<gennaro.prota@yahoo.com>, on 27/08/2010 07:42:53, wrote:

On 27/08/2010 1.43, Francesco S. Carta wrote:

Hi there,
in order to get a better grip on the stuff about "operator new", "new
operator", "placement new" and so forth I went back to the relevant
sections of TC++PL and of the FAQ, then I've implemented a simple Vector
template trying to follow the implementation of std::vector that I found
in my implementation (ehm... OK, you know what I mean despite that
repetition).

[...]

At some point I tested it with a fundamental type:

Vector<int> vec(10);

....and the template instantiation went fine, the program compiled and
ran as expected.

But then I thought: wait, I am calling the equivalent of "ptr->~int()"
within my destroy(), but if I write it directly, say something like this:

int* pi = new int;
pi->~int();

....then the compiler - as I expected - rejects it.

My conclusion is that the template mechanism is able to understand that
I'm going to do something useless with fundamental types and simply
ignores the "ptr->~value_type()" line when instantiating the template
with "value_type == some fundamental type".

Well, no, not the template mechanism. It's more like Johannes
said :-)

Note the note (!) in [class.dtor]/15, too (who said the C++
standard doesn't have a rationale; it just "inline" :-))

I see now, I shall force myself to refer to the standard before asking
such kind of questions here :-(

Any comment or further insight on the above?

Sooner or later I'll post here the complete implementation of my Vector
to get some advice (and some corrections, very likely), but in the mean
time I'm most concerned with the basic storage management internals that
I'm pasting here below, so please have a look and point out any wrong or
silly thing I could be doing:

I had a quick look. The abundance of reinterpret_casts is what
jumped most to my eye. (Well, duh... they have been invented to
stand out. I meant that you don't need them.)

       value_type* allocate(size_t n) {
           return
               reinterpret_cast<value_type*> (
                   new char[sizeof(value_type) * n]
               );
       }

I think the best option for this function, if you want to have
it, is to just return a void *, which is in general the way to
warn/inform the reader that he is dealing with (yet) "untyped"
memory:

I forgot to make it explicit that those functions are private members of
my Vector template

I understood that. And that this is a throw-away experiment. It
was more of a general point. Or rather, two general points. One
point is that there's a reader of the private parts too. The
reader may be you or another person, now or (for things that you
don't throw away) five years from now, but regardless of who the
person is, the more you can do to communicate the better. The
other point is that the more you let the compiler help with the
type system the better. A T * that doesn't actually point to a T
(because you haven't constructed it yet) is usable in limited
ways but the compiler won't help you with limiting those ways.

- I also forgot to make them static, but I'll fix it,
although it shouldn't change much. The purpose of this exercise is to
create a Vector as self-contained as possible.

By centralizing the conversion from void* to value_type* I can simply write:

start = allocate(n);

....in the implementation of other methods such as reserve(), where
"start" is a private value_type* data member of Vector, pointing to the
beginning of the allocated space.

Then, at least, do the conversion with static_cast. You are
using reinterpret_cast because you have a char *, but if you
call an operator new function you can use static_cast.

OK, I think I'm beginning to understand what you're telling me, but I'll
post my further questions as a separate discussion - this one is heavily
messed up due to my poor understanding of the matter.

    void *
    allocate( size_t n )
    {
        return new char[ n * sizeof... ] ; // or even just
                                           // operator new( n * sizeof...)
                                           // which I would prefer.
    }

And I'd use unsigned char, although in C++ char probably works
too (I have never dug into whether it really does. IIRC char may
have trap representations in C. And --this is the part I've not
dug into-- it perhaps cannot have them in C++. Since unsigned
char is the de facto standard to signal "raw bytes", I just use
unsigned char and go on. In fact, the last time I tried to dig,
I seemed to find several oddities in the standard, so why
risking.)

I'm not so sure I really need to switch to unsigned char, the standard
makes explicit examples with "raw" char for this technique, I think it's
required to work in all cases.

You probably don't "need" to switch, as I said.

Sorry, I badly expressed myself: as I put it, it seemed that you were
telling me I "needed" to switch, which in fact you weren't.

Back to your exercise, I'd recommend to mentally separate the
places where you can assume that there is an object constructed
in the buffer from the ones where you just have raw memory. And
code the details from there, with some private functions being
in fact the "transitions" between these two kinds of states.

I'll be back on this in a while.

       void deallocate(void* ptr) {
           delete[] reinterpret_cast<char*>(ptr);
       }

I'd call the operator delete[] function. No need to
reinterpret_cast.

Uh... actually, before adding that cast, I wrote:

delete[] ptr;

....and the compiler warned about "deleting void* is undefined"... it did
not cross my mind that I could explicitly call operator delete[] as you
suggest (it didn't just because I didn't know that!).

Now my code reads:

operator delete[] (ptr);

....in that line.

OK. I just forgot that I had given a "non-matching"
recommendation for the allocate() function though.

To match:

   void *
   allocate( size_t n )
   {
       return operator new[]( n * sizeof... ) ;
   }

I wonder why the compiler doesn't resolve "delete[]
ptr" to "operator delete[] (ptr)" and get rid of the warning... I need
more study to understand the issue, any further insight will be more
than welcome.

Oh, I hadn't sensed that you tried the delete expression on a
void pointer.

When the compiler sees your

delete[] ptr ;

it doesn't know what you really wanted to delete, and warns. So,
if you wanted to destroy objects (calling destructors) you'll
fix the type of ptr; if you just wanted to release storage
you'll call an operator delete function, directly.

It would really be a bad thing if the compiler did the sort of
dubious transformations that you suggest.

Sorry, once more, my poor understanding of the matter made all the
discussion cloudy, I didn't catch the difference between calling a plain
"delete" and calling "operator delete". I know, the FAQ explains it, but
one can read the explanation again and again and still miss to catch the
point.

       value_type* create(void* ptr) {
           return reinterpret_cast<value_type*>(new(ptr) value_type());
       }

This is a placement new *expression*: you already get a
value_type * (so this is another reinterpret_cast that goes
away). And do you need to return anything from the function?

       value_type* create(void* ptr, const value_type& t) {
           return reinterpret_cast<value_type*>(new(ptr) value_type(t));
       }

Likewise.

Eh, of course I don't need to cast them... silly me... but thanks for
pointing it out.

About the return value, I used it for a check at the calling place,
something like this:

assert(ptr == create(ptr));

....because at some point I thought that the new expression could
actually mangle the address to align it properly... now I almost sure
that such a thing could never happen.

You mean a difference between the address returned by the
operator new function and the address yielded by the new
expression?

If I got your question right, the answer is negative.

What I meant to say is that I was afraid that the following assert could
fail:

value_type* place = start;
value_type* result = new(place) value_type;
assert(place == result);

But reading again the requirements for placement new I see that (among
other things) I'm expected to pass a pointer properly aligned for
value_type - by inference, that should mean that placement new will not
mangle the address and the assert should never fail.

In general there may be a difference for the array forms. Note
that if you do

   operator new( n * sizeof( T ) )

you are not using an array form.

When you use array new through a new expression:

   p = new T[ n ] ;

the compiler will often require something more than n * sizeof(
T ) and the additional space will be used for runtime
bookkeeping.

This difference may vary from one allocation to another (and be
zero for some of them --or all of them).

For arrays of char or unsigned char the difference is
constrained by the requirement in [expr.new]/10 (C++03), so that
to place your T's, you can use a new expression

   new unsigned char[ ... ]

in alternative to the obvious

   operator new[](...)

(This must be either a case where they didn't want to introduce
a subtle difference, or where they have noticed that a lot of
code used the former. I'll call my favorite C++ historian, here.
James? :-))

The fact that requesting additional space is allowed only for
the array forms is spelled out in the already cited
[expr.new]/10:

   A new-expression passes the amount of space requested to the
   allocation function as the first argument of type std::size_t.
   That argument shall be no less than the size of the object
   being created; it may be greater than the size of the object
   being created only if the object is an array.

This is confirmed by two notes: the one in bullet 14 and note
211 in clause 18.

       void destroy(value_type* ptr) {
           ptr->~value_type();
       }

For your first experiments it is probably easier to use a
non-heap array, with one object. Just something like a

    unsigned char m_buffer[ sizeof( T ) ]

member in your class template (which, given that it contains at
most one object, you'd probably no longer call "vector" :-)).
That would get rid of allocate() and deallocate() and perhaps
even show more clearly that you are not doing any "placement
delete" (see [lib.new.delete.placement]). Then, for create() and
destroy() I think I'd just have:

    void construct_object( T const& ) ;
    void destroy_object() ;

And for their implementation, off the top of my head:

    void *
    address() // NOTE: depending on what you do, you might need a
               // const version of this, too
    {
        return m_buffer ;
    }

    template< typename T>
    void
    ...::construct_object( T const& t ) // PRE: first call, or first call
    { // after destroy_object
        new ( address() ) T( t ) ;
    }

    template< typename T> // PRE: an object exists (its lifetime
    void // has not ended)
    ...::destroy_object()
    {
        T * p( static_cast< T *>( address() ) ) ;
        p->T::~T() ;
    }

Note the separation I was talking about: before entering
construct_object you have to assume that there's no object in
the buffer (call it twice in a row and you have a problem :-)).
At its return though, you can assume that *there is* an object
(leaving the function with an exception doesn't count as a
"return", of course). destroy_object() does the opposite
transition: it must be entered when there is an object (and so
you can static_cast) and at its return you have raw memory.
Note, too, that if address() returned a char *, you couldn't
static_cast it to T * directly.

The Vector implementation is already advanced enough to need dynamic
management of the storage, as I already have working reserve(),
push_back() and clear(), now I'm working on insert() and erase(), but I
think I will stop when I'll reach begin() and end() without implementing
reverse_iterator - more about this (and about its rationale) in a
further post, where I'll show my complete implementation

Well, consider that it may be difficult to get good commenting
(or even any commenting) if the amount of code is high.

for the public
delight&& dissection - assuming short-circuit behavior at that "logical
and" ;-)

Dissection of who? :-P

Well, you are kidding but I'm starting to feel that somebody could
actually think about doing me "serious bodily harm" - I'm having hard
time wrapping my head around this stuff. As I said above, I'll post a
completely new thread and I'll forget about this one :-)

Thanks a lot for your notes Gennaro.

No problem. I'm over the amount of time that I can devote to
Usenet. Otherwise I'd have commented more.

Don't worry, I'll think my next thread will be more concise and sharp,
and hopefully I'll dissipate all my doubts about it in very short time :-)

Thanks a lot, once more.

--
FSC - http://userscripts.org/scripts/show/59948
http://fscode.altervista.org - http://sardinias.com