#include <utility>
 #include <iostream>

 struct trace
     trace() { std::cout << "default ctor\n"; }
     trace( trace const& ) { std::cout << "copy ctor\n"; }

     trace& operator=( trace const& ) {

                     std::cout << "copy assign\n"; return *this; }

How is that implemented? Perhaps
        trace &operator-(const trace &t) {
            std::cout << "assign" << std::endl;
            trace temp(t); // one copy ctor
            swap(temp); // void trace::swap(trace &); not shown
            return *this;

Not if you want it to be efficient with rvalues. Instead:

        trace& operator=(trace rhs) { swap(*this,rhs); return *this; }

    ~trace() { std::cout << "dtor\n"; }
     friend void swap( trace& x, trace& y ) { std::cout << "swap\n"; }



After I run this code with the changes I described, and a few other
minor ones, I get this output:

default ctor
=== by reference ===
default ctor
copy ctor
trace::swap(trace &t)
=== by value ===
default ctor
copy ctor
copy ctor
trace::swap(trace &t)
copy ctor
trace::swap(trace &t)
=== done ===

What conclusions can we draw from that?

Your compiler seems to be doing some nice optimizations for the value

The output from your by value version is,

=== by value ===
default ctor
=== done ===

I'm a little curious as to what's happening to the instance of trace()
that you're passing to set_by_value.

It's an allowed optimization called "copy elision." Most modern
compilers do it (probably yours, even). The compiler is allowed to
eliminate any copying of an rvalue passed as a by-value argument and any
copying performed when returning by value from a function. Both
optimizations are done by allocating the storage for the argument/return
value in the caller's stack area.

I wonder what would happen if the code was a little different, maybe,

     m("=== by value ===");
     const trace t;
     h.set_by_value( t );
     // do something else with t here

Now the argument is an lvalue; the compiler is forced to copy it.

