Re: Trivial initialization after non-trivial destruction

From:

Joshua Maurice <joshuamaurice@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Thu, 10 May 2012 16:23:06 -0700 (PDT)

Message-ID:

<62f340b7-0408-4ac8-aa7d-560444b6df89@n5g2000pbg.googlegroups.com>

On May 10, 2:30 pm, Daniel Kr?gler <daniel.krueg...@googlemail.com>
wrote:

Am 10.05.2012 20:48, schrieb Nikolay Ivchenkov:

Consider the following example:

    struct X
    {
        ~X() {}
    };

    template<class T>
        void destroy(T&x)
            { x.~T(); }

    int main()
    {
        X *p = (X *)operator new(sizeof(X));
        destroy(*p);
        destroy(*p); // well-defined or undefined?
        operator delete(p);
    }

According to C++11 - 3.8/1, non-trivial destruction ends the life-time
of an object. Can we assume that a new object of the same type exists
at the same location immediately after such non-trivial destruction
has done if its initialization is trivial?

You saw my other posting just today/yesterday, didn't you?
:)

I find the current wording state hard to interpret, but if we consider
the current wording state of

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1116

as representing the committees intention I would say that without
intervening copy of another X object into the storage pointed to by
pointer p, the life-time of the object has not started again, therefore
the second destruction would invoke undefined behaviour. But this also
would mean that the first destruction was invalid, because not object
representation of any X object had ever been copied into the originally
allocated memory.

I have questions.

Consider:

Example 1:
    /* C code */
    #include <stdlib.h>
    typedef struct Foo { int x; int y; } Foo;
    int main(void)
    {
        Foo * f = malloc(sizeof(Foo));
        f -> x = 1;
        return f -> x;
    }

Example 2:
    // Example 1 converted to C++
    #include <stdlib.h>
    typedef struct Foo { int x; int y; } Foo;
    int main(void)
    {
        Foo * f = (Foo*)malloc(sizeof(Foo));
        f -> x = 1;
        return f -> x;
    }

Example 3:
    // C++ code
    #include <stdlib.h>
    class Foo { public: Foo():x(0),y(0){} int x; int y; };
    int main(void)
    {
        Foo * f = (Foo*)malloc(sizeof(Foo));
        f -> x = 1;
        return f -> x;
    }

Example 1 better be UB-free. It's common C code which is well
understood to be UB-free.

Example 2 better be UB-free. It's similarly well understood that this
common C idiom will work when translated as shown into C++.

Example 3 better have UB. We want this to have UB to permit
optimizations by the compiler. There is no constructor call too
Foo::Foo on that memory, hence no Foo object exists in that memory,
hence UB when you "access" that memory through a Foo lvalue.

Let's go back to your original quote and see how it breaks down:

without
intervening copy of another X object into the storage pointed to by
pointer p, the life-time of the object has not started again,

The relevant code from example 2 and example 3 is:
        Foo * f = (Foo*)malloc(sizeof(Foo));
        f -> x = 1;
I argue this is (or ought to be) definitionally equivalent to:
        Foo * f = (Foo*)malloc(sizeof(Foo));
        int * tmp = &((*f).x);
        *tmp = 1;

By "your" reasoning, the above code either:
- has an access to a region of memory through a Foo lvalue without a
previous ctor call nor a "copy of another [Foo] object into the
storage", and thus UB, or
- does not have an access to a region of memory through a Foo lvalue,
and thus the only accesses are through int lvalues (specifically the
first access is a "copy of another [int] object into the storage"),
and thus UB-free.

Whether Foo has trivial initialization doesn't come into the analysis
under your rules. Thus, either both 2 and 3 are UB-free, or both 2 and
3 have UB. Neither is acceptable. Thus your rules must be broken.

I have a couple ideas on the intent of the rules, and what they should
be.

For starters, I argue that:
        f -> x = 1;
is definitionally equivalent to:
        int * tmp = &((*f).x);
        *tmp = 1;

Next I think we need to recognize that all writes and reads are done
through:
    - primitive lvalues,
    - virtual function calls,
    - ctors and dtors,
    - magic standard library calls, such as std::memset,
    (I forget if there's anything else)
Specifically,
        int * tmp = &((*f).x);
        *tmp = 1;
The expression "&((*f).x);" is neither a read nor a write of the
memory pointed-to by f. It is a read of the pointer value f, an
addition of a compile-time offset, and a write into the stack variable
tmp. No read nor write has happened to *f. In fact, this is a very
important guarantee that we need while writing threading code.

The question is whether we want to call something an "access" which is
neither a read nor a write. I'm partial to saying
        int * tmp = &((*f).x);
is an access through a Foo lvalue, and building up the rules from
there. More specifically, I think the rules need to involve escape
analysis on the pointers - the source of the pointer values - in
determining whether the pointer value is valid for use. IIRC, I think
the C committee is leaning in this direction.

As another way to look at the problem, consider the C code:
    #include <stdlib.h>
    typedef struct T1 { int x; int y; } T1;
    typedef struct T2 { int x; int y; } T2;
    int main(void)
    {
        void* p = malloc(sizeof(T1) + sizeof(T2));
        T1 * t1 = p;
        t1->x = 1;
        t1->y = 2;
        T2 * t2 = p;
        return t2->y;
    }
(Un)fortunately, there's a nasty little obscure rule in both C and C++
that allows you to read the common leading members of two (POD)
structs (or something like that). Thus, that example is actually UB-
free AFAIK.

This plays havoc with the analysis. Specifically, consider a slightly
different program:
    /* C code */
    #include <stdlib.h>
    typedef struct T1 { int x; int y; } T1;
    typedef struct T2 { int x; int y; } T2;
    int main(void)
    {
        void* p = malloc(sizeof(T1) + sizeof(T2));
        T1 * t1 = p;
        T2 * t2 = p;
        t1->x = 1;
        t2->y = 2;
        return t1->x;
    }
Is that UB? I don't even know. It gets harder as we go on.

This is definitely a C problem which was inherited by C++. With non-C
types, specifically the ones with non-trivial constructors and non-
trivial destructors, I think the rules are far clearer.

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]