Re: std::string bad design????

From:

"James Kanze" <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

8 Jan 2007 23:50:39 -0500

Message-ID:

<1168299759.605956.267430@s34g2000cwa.googlegroups.com>

Le Chaud Lapin wrote:

James Kanze wrote:

The problem is that you don't know what the classes do. You use
an std::map, for example; how do you know that it doesn't use
static memory (e.g. in its allocators) in the implementation.
Unless std::map is thread-safe, you cannot use it in a
multithreaded application.

There is a fundamental difference in expectations here. I do not
expect any state, not even a simple int, to be thread-safe unless I
make it thread-safe;

int x; // not thread-safe

I'm not sure I understand. This means that you don't do
preemptive multi-threading. (Which is the only kind you have
under Windows or Unix.)

Note that _any_ data structure is "thread-unsafe". That is to be
expected. If you have a global variable, and N threads operate against
it, and that variable is not protected, there will be contention.

I'm not talking about global variables here. Without some
thread-safety guarantee, you cannot declare local instances of a
variable in different threads.

This was really the case with std::basic_string in g++ pre-3.0.

If you write map<> so that it uses global variables,

I don't write map<>. I use it. My compiler implementor wrote
it (or at least, paid someone to write it).

and you try to use
map<>'s in a multi-threaded application, you already know that there
will be problems.

Not with any of the implementations of the STL that I know. The
VC++ implementation is thread safe, as is the g++ one, and the
Rogue Wave one provided with Sun CC. All of the original STL
has been thread safe since I've been using it, and long before.
(G++ had problems with string because it wasn't part of the
original STL, and they did a quick, albeit remarkably clean
implementation of it themselves. And they didn't make it
thread safe, since the code generated by the compiler wasn't
usable in a multithreaded environment to begin with.)

I have my own equivalent of map<>. In fact, I have
several of them, and none of them use global variables.

So where do they get their memory from? All of the
implementations of malloc/operator new that I know use variables
with static lifetime.

Whether it was
necessary to use a global variable in the implementation of map<> is
another discussion, but this is about common sense. I do not think
anyone who has been writing multi-threaded applications, knowing what a
global variable is, will have any expectations otherwise.

If you use map<>, then you have to know what it guarantees.

You might want to check
http://www.sgi.com/tech/stl/thread_safety.html; I'm pretty sure
that this corresponds to the thinking of the committee with
regards to how thread safety will be defined in the library.
(Except some special cases, like maybe std::cerr, and almost
certainly std::exit().)

I will take a look, but there is no need.

Right. You know it all, and everyone else is an idiot.

That page happens to be originally written by Hans Boehm and
Matt Austern. Two of the best experts I know.

I can write a class Foo,
right now, make it so that it uses a global variable, run multiple
threads against it, and watch my program crash.

Right. Now can you get it through your head that the compiler
can also do this, and that the standard library can also do
this. And that in fact, compilers and implementations of the
standard library have done it. And that you need guarantees
from the language and from the library specification concernig
what the guarantee to do, and what they guarantee not to do, and
that without those guarantees, you cannot write thread safe
code.

I will not be
surprised in the least. I can also write a function Bar, make it so
that it uses a static local (global) variable, run multiple-threads
against it, and watch my program crash. I will not be surprised in the
least.

Note that the implementation of std::basic_string in g++ does
not give this guarantee, although I think that the developpers
consider this an error (The combination of circumstances
necessary to encounter a problem is extremely unlikely to occur
in actual practice. But at least some of the developpers of g++
feel like I do: unless the probability is 0, it's an error.)

I definitely agree that engineering should be deliberate and
predictable. I cannot imagine that the people who designed the
standard library did not have multi-threading in the back of their
minds while making the library.

Imagine differently, then. STL was originally developped using
the old Borland C++ compiler, under MS-DOS. I'm 100% sure that
the author didn't take threading issues into account then.

Even today, you often find disagreement concerning what
"thread-safe" means in a library, although a concensus is
gradually growing to adopt the Posix definition.

Note that in g++ pre 3.0, every single constructor of
std::basic_string modified global state. As did the
implementation of throw. Presumably, if you created the
necessary mutex, and never created a std::basic_string (nor
called a function which might do so---and do you know for sure which
functions in the standard library create temporary strings)
without first acquiring the mutex, and wrapped every throw/catch
with a mutex, your code might work. (Then again, it might not,
because there was no guarantee that nothing else used static
data.)

Ah...how refreshing. I see convergence in our thinking forthcoming. :)

Microsoft's approach to this was to provide two libraries: one for
single-threaded applications. one for multi-threaded applications. It
is the same approach I take. Fortunately, 98%+ of my classes are
already "thread-safe", meaning, they do not use global variables as
part of their implementation unless they have to. I have found that
most classes can fit this model, except for things like random number
generators, or classes that require massive global state to help it,
like Integer::is_prime() which works fastest if it is allowed to
maintain a global static array of small primes, say those less than
60,000. But that array is declared const and never changes, so it is
immune to requiring a critical section (spin-lock with failover to
mutex).

Microsoft's approach, at least in part, has been to document
what you have to do to write thread safe code using their
development system. Using the correct version of the library is
part of it; I'm willing to bet that you also need specific
options to the compiler, or maybe a /D to define some
preprocessor symbol.

The problem there is that what you have to do will not be the
same as what you have to do with Sun CC, or with g++, under
Solaris. As I've stated repeatedly, Sun CC and g++ implement
different sets of rules, under Solaris.

I've had to deal with this. I've had to modify code that was
carefully designed to work under the Sun CC rules, because it
didn't work with g++. It's a very real problem.

Having the standard address threading will solve this.
Obviously, you still need to use the primitives. The difference
is that it will be the same primitives you will need on all
systems, and using them will give you the same guarantees.
(Note that the fact that a global instance never changes does
not mean that you can access it without a lock with g++, even
today. This is in conflict with the usual Posix rules, however,
and at least some of the g++ development team consider it an
error.)

But for your std::basic_string example, note that, if the
implementation of std::basic_string used a global variable, I would
never place the burden of supplying mutexes on the user of that
component. Again, I would follow Microsoft's approach, and provide a
library for single-threaded applications, and one for multi-threaded
applications. The single-threaded application library would not have
protection. The multi-threaded one would. This works very well today.

That's not been my experience (that it works very well today).
Of course, I have to support two systems (Solaris on Sparc and
Linux on PC), with a number of different versions of three
different compilers: Sun CC under Solaris, g++ under Solaris and
g++ under Linux. (Some of the lower level stuff also has to
work with VC++ 6.0 under Windows, but I'm not responsible for
that.)

Note that both Solaris and Linux are aim for Posix
compatibility, at least where threads are involved. But Posix
doesn't define a binding for C++, and the authors of Sun CC and
of g++ "exterpolated" the C binding differently for C++.

Unless a library is thread safe, it cannot be used in
multithreaded code. Note that just grabbing a mutex at the
start of every function, and releasing it at the end, is neither
necessary nor sufficient to make a library thread safe. To make
a library thread-safe, you must document the guarantees that are
given: thus, for example SGI (and everyone else) guarantees that
you can create two separate instances of std::vector in two
different threads, and use them, without external
synchronization.

I *definitely* agree with the spirit of this paragraph. No programmer
should ever be burden with throwing in mutexes to protect against
exclusivity issues that might or might not be imminent. I think where
we might disagree is in answer the question,

"What should be done about mutual exclusion issues?"

I feel that the double-library approach is optimal.

That's probably the best solution for an implementation, today.
You still have to define what is and what is not guaranteed in
the thread safe library, and I would very definitly like to see
this standardized.

Note too that it goes beyond the library. G++ pre 3.0 used
static data in its implementation of stack walkback in
exceptions.

If I did not state
this explicitly in my earlier posts, my apologies. I just presumed that
everyone who was writing multi-threaded applications was doing this, as
you cannot write them under Windows without doing this.

You need a special option to the compiler driver under Solaris
or Linux as well; this special option takes care of defining the
correct #defines, using the correct libraries, etc. But the
guarantees you get still aren't standard, and code which worked
perfectly well with Sun CC failed with g++ (and vice versa).

The double-library approach would not burden single-threaded
applications with unnecessary critical sections (not to mention extra
global state), and would still work for multi-threaded applications.

And of course, the compiler must also make similar guarantees.
If you're working on Intel architecture, for example, the
compiler only has a very few registers to work with, and will
spill to memory if the expression is sufficiently complicated.
Every compiler I know spills to stack, but there is nothing in
the C++ standard to require this, and if the compiler spills to
static memory, you can end up with code which may or may not
work, depending on the level of optimization.

I wrote in another response to your bringing up this issue that I would
find it very odd if a compiler writer solved the register overflow
problem by spilling to static memory. It is simply not necessary when
the stack is available. Furthermore, in general, it would break a
guarantee that we do have: support for recursion.

It doesn't affect recursion, since the compiler can arrange to
only do it when there are no function calls. But I agree, the
stack is the natural place to put them, at least on machines
with stacks and with decent based addressing modes. (It was
frequent for 8080 compilers to spill to static memory, but that
was because the 8080 had really bad support for based
addressing; to access a variable on the stack took something
like seven or eight instructions, and could only access bytes,
as opposed to a single instruction for 16 bit static values.)

Still, there is a very large difference between my personal
feeling that the only reasonable way for a compiler to do
something is x, and a guarantee in the standard that something
must work.

If F calls G, G calls H, and H calls F, then spilling to static memory
in F would result in a dead end. The stack has to be used.

Only at the moment when F calls G. Register spills are of very
short lifetime, never beyond the end of an expression, and it's
generally pretty easy to reorder the expression so that there
are no function calls while the spills are active.

Why don't you read what the experts are saying, instead of
arguing against strawmen of your own creation.

I am. The most I have seen so far, aside from the cache coherency
problem (which has been around for a long time), is that if two threads
operate against an unprotected global variable, there is a potential
for corruption. I did not say this. Others did.

Is it expectation that C++ will somehow develop some universally
applicable lock-free mechanism?

What, may I ask, gives you this idea? I've not even seen it
suggested.

I am trying to figure out what are peoples expectations of C++ beyond
providing a thread-safe library.

I've repeated it several times: I want the behavior of my
program to be defined. I want it exactly specified when I need
locking, and when I don't.

Did you see my short example. (Did it even appear? I think
some of my postings are disappearing before they reach the
moderation site.) Basically:

    std::string globalButNeverModified ;

    void
    f()
    {
        // Guaranteed with Sun CC, doesn't work with g++...
        if ( globalButNeverModified[ 0 ] == 'c' ) // ...
        // Guaranteed with g++, doesn't work with Sun CC...
        static std::string localStatic ;
    }

I'd like to know portably what I can count on, and what not.

--
James Kanze (Gabi Software) email: james.kanze@gmail.com
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]