Re: Efficient use of C++ Strings: Request for Comments

From:

Dizzy <dizzy@roedu.net>

Newsgroups:

comp.lang.c++.moderated

Date:

Fri, 2 Feb 2007 17:01:06 CST

Message-ID:

<45c1f20a$0$49205$14726298@news.sunsite.dk>

Scott McKellar wrote:

Dizzy wrote (amid considerable snippage):

also consider what happens with the destruction order of statics, if one
will need a (static) string from another static object destructor things
can go very bad if that static string was destructed already.

I'm afraid this argument baffles me. You're worried that a static string
might be destroyed before I'm done with it? But an automatic string
wouldn't be?

Yes. Imagine a class with non-trivial destructor that needs to do some
std::string work. An auto string of course is created when the function
gets to the code that declares it and cleaned up on exit (of it's scope). A
local static string will be created at the first call of the function but
destroyed in the reverse order of statics creation (all statics from
created in the program). This order of destruction does not guarantee that
if that destructor is also called in case the object is static that the
static string has not been destroyed already (I do discover bugs because of
this destruction order of statics now and then).

Example of code (static_crash.cpp):
#include <string>

class MyClass
{
public:
        ~MyClass() throw() {
                static std::string tmp;
                tmp += "exit message whatever";
        }
};

MyClass static_obj;

int main()
{
        MyClass auto_obj;
}

This program is completely broken (undefined behavior). Now on my platform
it doesnt actually manage to crash but valgrind clearly shows that there
are a lots of invalid memory accesses like this one:
==13486== Invalid read of size 8
==13486== at 0x4BBF9A8: std::string::append(char const*, unsigned long)
(in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.1.1/libstdc++.so.6.0.8)
==13486== by 0x4009BE: MyClass::~MyClass()
(in /home/dizzy/work/test/static_crash)
==13486== by 0x4008FF: __tcf_1 (in /home/dizzy/work/test/static_crash)
==13486== by 0x4FB6A64: exit (in /lib64/libc-2.4.so)
==13486== by 0x4FA213A: (below main) (in /lib64/libc-2.4.so)
==13486== Address 0x51B0030 is 0 bytes inside a block of size 46 free'd
==13486== at 0x4A1F111: operator delete(void*)
(in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==13486== by 0x4BBFD59: std::string::~string()
(in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.1.1/libstdc++.so.6.0.8)
==13486== by 0x40089D: __tcf_0 (in /home/dizzy/work/test/static_crash)
==13486== by 0x4FB6A64: exit (in /lib64/libc-2.4.so)
==13486== by 0x4FA213A: (below main) (in /lib64/libc-2.4.so)

What valgrind shows where is that we are trying to call append on
std::string on a memory buffer (inside std::string) that has been already
reclaimed when std::string static variable was destroyed.

I instantiate MyClass both static and in main. The static version is created
first (BUT NOT the static std::string in it's destructor because that's
created at the first destructor call which none happened yet), then the
main version is created. Now when exiting main the main version will call
the auto MyClass destructor which (being the first call to ~MyClass) will
create the static local std::string. Notice that the static local
std::string is created AFTER the static global MyClass static object which
means the destruction will happen in the reverse order. So after main
cleans up the enviroment will start to destroy static objects in reverse
creation order which as I said first it will destroy the local static
std::string then the global static MyClass instance where it will try to
use the local std::string which has been destructed already, kaboom!

In order to solve this there are various work arrounds one for example to
move the local std::string to be a global one (and show in the translation
unit before it's static users to force a proper order of destruction) but
having global statics I don't think it's acceptable from a design point of
view (imagine many such globals that one has to know about and reuse them
in the function bodies).

My suggestion about providing overloaded functions is highly dependent on
the context. In a given case it may be useless, or there may be a better
approach. In other cases the overloading may be useful even if only as
a stopgap or transitional measure, when you don't have time to rewrite as
much as you'd like.

Agreed then, I didn't noticed the tip was about an existent base of code
using const char* a lot already.

However you still have to construct an object in order to return one. The
purpose of the Tip 4 is to avoid the construction, and the later
destruction.

But my RVO test (and dissasambling after) showed that the object was created
directly on the caller stack and no copy ever happened. What do you mean to
remove the creation ? You still need a string in the caller that's why it
calls your function to get a string, you still need to construct it at
least once. Or maybe you have some specific code example in mind.

- depends on the implementation (a reference count implementation as
gcc's libstdc++ just increments a counter)

Yes, in some implementations copying is very efficient. In others
(including some reference-counted implementations if they are designed to
be
thread-safe), it isn't. I don't like the idea of relying on the internal
details of a particular implementation, except as a last resort.

Which is why one should implement his own version (or use one existent that
does what he wants, how does STLport imeplements std::string?).

(slightly offtopic) the fun part about this is that I remember that I read
some article on the Internet talking about reference counted vs normal copy
std::string implementations and it showed (with benchmarks) that many times
a reference counted implementation might be slower then a copy one in
multithreading enviroments. "copy" itself turns out to be pretty cheap
nowadays compared to other operations such as locking.

That's a good counterexample, and I shall add it to my pages, if you
don't mind.

Please do so. That example it's not only a theoretical example I tend to
initilize using only initialization list for various reasons (design,
exception behaivour, not using the dreaded delayed initialization idiom,
etc) and that in turn seems to force me to have functions returning their
result by value. Luckily with std::string that I don't think it's a real
issue (can be reference counted, can use rvalue references in the future,
etc) but with some other more complex types could be more troublesome
(although rvalue references and move semantics should eliminate copy for
any complex type).

This is why I sometimes wish iostream interface would have worked returning
by value on input not having to create the object upfront (some objects
just don't make any sense having a default constructor).

7. Preallocate space for large strings.

Not very sure what you mean with this. You probably have some exact
sample code in mind if you care to show it.

The relevant web page starts off with just such a sample. I guess it's
not as clear as I thought it was.

The idea is that if you gradually grow a string by successive additions,
it may go through several cycles of allocating, copying, and deleting ever
larger buffers. If you reserve enough space at the outset, you can avoid
some of that churn.

Aha, ok then. I find it equivalent to the tip on std::vector to reserve size
upfront if one knowns what that would be.

A principal motivation for the occasional judicious use of C-style
character
arrays is that sometimes you know how big a buffer needs to be. In that
case, the dynamic memory management offered by strings provides
flexibility that you don't need, at a price that you may not wish to pay.

Good point (I am a strong adept of C++'s "don't pay for what you don't need"
principle). But in that case one only has to apply the previous tip :)

--
Dizzy
http://dizzy.roedu.net

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]