Re: Legacy APIs which output C-style strings: Opportunity to use move semantics?

From:

null hypothesis <null.hypotheses@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Mon, 9 Aug 2010 01:35:48 -0700 (PDT)

Message-ID:

<beac016b-672b-42d7-b827-d8e47d894d55@n19g2000prf.googlegroups.com>

On Aug 8, 6:58 pm, Kai-Uwe Bux <jkherci...@gmx.net> wrote:

null hypothesis wrote:

On Aug 8, 3:57 pm, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
[...]

a) Memory governed by a string is handled via an allocator. If you mov=

from a char* the information about how the memory for the char* was
allocated (and has to be deallocated) is lost.

Assume we move a char * to a mystr S allocated with allocator A: Is it
too
difficult for the compiler to:
*) free the original contents of S by calling A.destroy()
*) know full well that it is moving a char * with some (probably
magic) allocator
and mark it as such?

No, that might not be too difficult. But it does not solve the problem.

How should the destructor of the string go about releasing the memory? Af=

ter

all, even if the compiler passes the information that the memory was
allocated via weird_alloc_method_from_library_X to the string, how could =

guess successfully the required deallocation function?

Ah! Brilliant point! I hadn't thought about this. Instead of supplying
you
with a half-baked solution let me rephrase to understand the problem
better: Does that mean that move semantics is inherently unsuitable
for
any type that does not provide us with a clear notion of the
underlying
allocator?

b) String implementations have to manage size information (e.g., becau=

strings are allowed to contain 0-characaters). It is not ruled out tha=

the size information is put into the same contiguous memory as the
character sequence, which then has to be sizeof(size_type) longer.

And the reverse is equally true -- the implementation can choose to
keep this as a
separate member of the basic_string_impl struct. Then all we need is
swap the
data member of this struct and initialize the length = capacity to
equal the
length of the string.

True, but mandating move constructors in the standard would essentially
force this implementation.

I was/am under the impression that the forthcoming standard *mandates*
(in
the sense that it'd like more people to use move semantics where
possible)
move semantics?

I can see why the committee decided not to go
that way.

Okay, now I have absolutely no idea what you mean by this! Can you
kindly
elaborate?

In moving from
char* to string, it might be impossible to obtain this additional piec=

of memory in the right place.

When moving from char * to strings, why would I even consider anything
beyond
the first null terminator?

The problem is not the space beyond the first null terminator but the spa=

_before_ the character sequence. That is a place where the string
implementation (in the memory it manages via the allocator) may store the
size information. With a char* provided from the outside, that space migh=

not be available.

I think I already replied to this. But I get your point, allowing
move
semantics would necessarily limit the implementers choice of design.
Am I
correct?

Moving the other way, you run into problems when
it comes to deallocating the char*.

Yes, absolutely. I should have stated this, but I did not intend that
basic_strings
could be moved to a char *. Such semantics would be as limited as
c_str() is.

So, I am looking at something like:

/*
** The function replaces the string controlled by *this
** with a string of length strlen(str) whose elements
** are a copy of the string controlled by str. Leaves str
** in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
assign(_Elem *str);

Or, more generally:

struct mystr {
size_t len;
char *b;
mystr() : len(0), b(0) {}
mystr(mystr const& s)
: len(s.len),
b(new char[len + 1]) {
memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
}
mystr(mystr&& s)
: len(0), b(0)
{
swap(b, s.b);
s.len = 0;
}
mystr(char *s)
: len(strlen(s)), b(0) {
swap(b, s);
}
/**
Others omitted for brevity
*/
};

Note that this implementation does not take care of the allocator issu=

by implicitly assuming the char* member and the free char* are to be
deallcated the same way.

Not that this did not occur to me, but I was trying to explain
what I was trying to devise: A one way char * to string move semantics
for the
string library. I intentionally left the allocator out for simplicity.

Even without the allocator, the rub comes with the destructor.

BTW: Why doesn't basic_string have a ctor analogous to
vector(size_type n)?

It does:

basic_string(size_type n,
charT c,
const Allocator& a = Allocator());

The difference is only that you may not omit the charT parameter c.

Hm. I am aware of this. In essence, none of the STL containers/the
string
library (I state them separately since the latter is not considered
part
of the STL by some) allow us to allocate without initialization. I
guess
that's good and that essentially forces a copy. So, with the Meyer's
solution I actually end up with two copies to the vector's memory
(once
during creation and once during the actual writing via the call to a
legacy
API). Suddenly, it appears that creating a char/wchar_t buffer and
copying
it out to a basic_string is more efficient than what we learn from
Effective
STL! Thoughts?