Re: Move semantics and moved/empty objects

From:

David Abrahams <dave@boostpro.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Wed, 30 Jul 2008 21:56:14 CST

Message-ID:

<87zlnz82c6.fsf@mcbain.luannocracy.com>

on Wed Jul 30 2008, Mathias Gaunard <loufoque-AT-gmail.com> wrote:

On 30 juil, 07:40, David Abrahams <d...@boostpro.com> wrote:

- Allowing empty objects to be "replaced" with non-empty ones. That is
to say allowing operator= to function with the left operand being an
empty object. That means an indirection in operator=.

</restoring context>

I don't know what you mean by indirection.

A conditional branch, a call through a function pointer, whatever that
allows making the two cases distinct.

which two cases? I'm completely lost as to how indirection is related
to the rest of what you're saying.

- Allowing empty objects to be assigned to other objects and to be
constructed from. That means a double indirection in operator=, one in
the copy constructor, and emptiness can propagate. That doesn't sound
very desirable, it's as if you cannot rely on the invariant at all.

I don't know what you mean by any of that bullet. Could you please
explain?

Let's take the example of a resource. I will first suppose that the
object is never empty.
The resource is identified by a pointer, will several constructs :
open (throws if fails), close (never throws), and copy (throws if
fails).
close and copy only works will valid, open resources.

Sure; a nice strong invariant, and one that's incompatible with move
semantics unless you can figure out how to synthesize an open resource
without throwing.

This example is not unlike the recent N2698 paper, except that I have
added copy support too, and fixed an error in operator= (the paper
called delete when I should have called close).
http://open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2698.html

struct resource
{
     resource(args...) : h(open(args...)) {}

     resource(const resource& other) : h(copy(other.h)) {}

     resource(resource&& other) : h(other.h)
     {
         other.h = 0; // "moved-from" state
     }

     resource& operator=(const resource& other)
     {
         Handle* new_handle = copy(other.h);
         close(h);
         h = new_handle;
     }

     resource& operator=(resource&& other) // caution: self-assignment
isn't safe
     {
         close(h);
         h = other.h;
         other.h = 0; // "moved-from" state

         return *this;
     }

     ~resource()
     {
         close(h);
     }

private:
     Handle* h;
};

Now, let's make the destructor callable from moved-from objects (first
bullet).
     ~resource()
     {
         if(h)
             close(h);
     }
This adds one indirection.

I see an if. I don't quite see that as an indirection. It added a
test-and-branch. Typically the cost, with respect to that of
allocating/deallocating the underlying resource, is negligible. So
what's your concern?

Now, let's consider making operator= callable from moved-from objects,
since it is required by std::swap. (second bullet)
     resource& operator=(const resource& other)
     {
         Handle* new_handle = copy(other.h);
         if(h)
             close(h);
         h = new_handle;
     }

resource& operator=(resource other)
{
swap(*this,other);
return *this;
}

     resource& operator=(resource&& other)
     {
         if(h)
             close(h);
         h = other.h;
         other.h = 0; // "moved-from" state

         return *this;
     }
This adds one indirection too.

Still not seeing an indirection.

Let's allow the moved-from objects to be assigned and constructed
from. (third bullet)
     resource(const resource& other) : h(other.h ? copy(other.h) : 0)
{}

     resource& operator=(const resource& other)
     {
         Handle* new_handle = other.h ? copy(other.h) : 0;
         if(h)
             close(h);
         h = new_handle;
     }
This adds one additional indirection in operator=(lvalue) and one in
the copy constructor.

OK, leaving aside for the moment my disagreement with your use of the
term "indirection," what's your point?

I certainly do not allowing moved-from objects to be assigned and
constructed from, which would require all modifications up to the
third bullet, not because of its inefficiency, but because it can
propagate the "moved-from" state, which I consider highly undesirable.
Such a state should never be accessed to begin with in my opinion!

It's got to be accessed by the destructor at minimum. Once you allow it
into the class invariant, it's in.

I would quite like the Standard Library to make promises to not
perform certain operations on moved-from objects, that is all.

Which operations? Copy and assign?

Ideally I would personally prefer to restrict to destructor-only. It
adds a little performance penalty which can be avoided altogether. I'm
in a favor of only using destruct/reconstruct on empty types to
perform assignment. Yes, it isn't exception-safe, but restoring an
empty state in case of failure should be nothrow.

I don't understand what you're driving at here either. Could you
please spell it out for me?

I believe moved-from objects should not allow anything but
destruction, and thus there should be no need to support that.

Support what?

Simply because moved-from objects are supposed to be rvalues,

No, they're not. If moving were restricted to rvalues it would prevent
many important optimizations such as the ones we're making in
std::vector.

and rvalues cannot be assigned to, for example.

Moved-from objects are objects which were casted to rvalues,

That's a simplified way of looking at it, but in reality std::move
doesn't do any casting.

and they should be treated as such.

Meaning, "they should never be touched again." When you operate on (in
this case, move from) an rvalue, that normally means it will be
destroyed before any other code gets to touch it. Again, that would
prohibit important optimizations.

There is nothing that can be done on rvalues once they've been moved
except destructing them (which is done automatically), so casted
lvalues should, in my opinion, maintain this and thus performing any
other operation should not be done by any part of the standard
library.

However, std::swap does it.

void swap(T& a, T& b)
{
     T tmp(std::move(a));
     a = std::move(b);
     b = std::move(tmp);
}

So you'd prefer

     void swap(T& a, T& b)
     {
          T tmp(std::move(a));
          a.~T();
          new (&a) T(std::move(b));
          b.~T();
          new (&b) T(std::move(tmp));
     }

??

The problematic line is a = std::move(b).
We lied to T::T(T&&) saying that 'a' was an rvalue

No, we said "you can move from it."

where it really
wasn't. We tricked it into believing that. T::T(T&&) could then
theoretically choose to put 'a' in a state it cannot be assigned-to,
since rvalues cannot.
But then, the code would break.

I kind of accuse that swap implementation to be evil.

The alternative solution I propose is to destruct 'a' and construct-
move 'b' in it instead of performing the assignment.

That's awful, though. If you're going to force everyone to manually
destroy moved-from values before re-using them, you may as well
implement "destructive move" and have the compiler do it. But then, we
don't know how to write safe code with destructive move semantics. If
you can solve that problem, we can talk about it.

That would of course
require the addition of another construction primitive.

Nor here.

The aforementioned alternative introduces exception-safety issues.
I proposed to solve them by introducing a nothrow construction
primitive that constructs an object directly in moved-from state.

Why would you want to *add* a new way to achieve the state you are
trying to avoid propogating?

That's however a fairly bad solution, since I can hardly see how it
could integrate with non move-aware types.

I would like the standard committee to clearly define guidelines as
to what types ought to do when faced with move semantics, what
operations should still say valid, especially types that aim to
provide a never- empty invariant.

But that's not the committee's job, so don't hold your breath.

The standard library is expected to heavily make use of movability, be
it for containers of algorithms. I would like to have good guarantees
about what they'll do.

Sure, I think you do.

The MoveConstructible and MoveAssignable concepts don't help: they
require the arguments to be (real) rvalues,

Where did you get that idea?

and there is no problem in that case. std::swap is only supposed to
require MoveConstructible and MoveAssignable, but since it lied using
casting,

It didn't. There's no casting and no lying.

Of course, you can accuse any code of lying if you make up your own
definitions of things like rvalue reference, but it doesn't hold water.

it actually passed an lvalue when an rvalue was expected,
which is the real source of the problem.

I haven't seen a problem yet.

*I'm afraid there is really a flaw in the standard here*. So either
the concepts need fixing, or std::swap does.

The fact that you don't like it doesn't make it flawed.

Two solutions:
- Require bullet 2 for MoveAssignable types
- Really treat lvalues casted to rvalues as rvalues (which seems more
logical to me), that means not accessing them more than once. Think of
alternative techniques to implement std::swap in terms of moves.

Basically, if you want to provide a never-empty invariant, you have to
figure out how to do it so that moved-from objects are not empty, for
whatever your definition of "empty" is.

Why?

Because it's *a class invariant*. That means it's always true outside
of mutating operations between construction and destruction. That's
just by definition. http://en.wikipedia.org/wiki/Class_invariant

Real rvalues can only be accessed once.

std::vector<int> const& x = std::vector<int>(10);

now access the rvalue through x as many times as you like.

I can put them in whatever
state I like after that, as far as it is destructible it shouldn't
matter.

Not really; that's why rvalue reference parameters are treated as
lvalues inside functions.

It's true though that with rvalue references, I can treat rvalues as
lvalues and access them multiple times.

You don't need rvalue references to do that as shown above.

Maybe that is also a defect in itself however.

maybe-you'd-prefer-a-pure-functional-language-ly y'rs,

---
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]