Re: std::string bad design????

From:

"Le Chaud Lapin" <jaibuduvin@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

9 Jan 2007 08:07:55 -0500

Message-ID:

<1168323842.334923.294460@q40g2000cwq.googlegroups.com>

Lourens Veen wrote:

Duh. Neither will we. The point is that you could write a function
Bar, make it so that it uses a static local (global) variable,
_protect_it_using_a_mutex_, run multiple threads against it, and
watch the programme crash. I bet that that _would_ surprise you.

That depends. Mutex is only useful if concept of the class inherently
supports multi-threading. If it does not, then synchronization will
not help. In that case, there might still be a crash even if there is
never a race condition, because again, the objects were never meant for
multiple threads.

However, there is nothing in the C++ standard that prevents this from
happening.

And it should not. That would be a class design issue.

Microsoft's approach to this was to provide two libraries: one for
single-threaded applications. one for multi-threaded applications.
It
is the same approach I take. Fortunately, 98%+ of my classes are
already "thread-safe", meaning, they do not use global variables as
part of their implementation unless they have to. I have found that
most classes can fit this model, except for things like random
number generators, or classes that require massive global state to
help it, like Integer::is_prime() which works fastest if it is
allowed to maintain a global static array of small primes, say those
less than
60,000. But that array is declared const and never changes, so it
is immune to requiring a critical section (spin-lock with failover
to mutex).

Really? Who says that accessing a const array from various threads at
the same time won't cause a crash? What if this system has two
processors, and your const array is stored in a ROM connected to a
bus shared between the processors? Two simultaneous reads could very
well mess up the bus's addressing logic and crash the system.

I am an electrical engineer...and I really do not know what to say
about this. :)

None of my friends have ever designed a computer where simultaneous
reads by two processors will "mess up the bus's addressing logic." I
will ask a few of them tomorrow to be sure, as my specialty is RF and
communications, not computer systems.

A more realistic example would be pumping the correct sequence of bytes
to the FSM at location 0 of an Intel 28F256 flash EEPROM, but frying
the device for being too slow, and another thread comes along and reads
the memory receiving garbage. In that case, that would be a fault with
software or hardware, not the language.

Well, in practice it seems to mostly work. But it is undefined
behaviour, and there is no guarantee that a conforming compiler won't
mess it up.

I am sure someone will sooner or later show me an example where mutual
exclusion does not help (ultra-exotic hardware notwithstanding).

I *definitely* agree with the spirit of this paragraph. No
programmer should ever be burden with throwing in mutexes to protect
against exclusivity issues that might or might not be imminent. I
think where we might disagree is in answer the question,

No, where you disagree is that you think that in current C++, you can
make something thread-safe by protecting it with a mutex. That isn't
the case: many implementations give additional guarantees that it
will work, but some may not, and they will still be conforming.

Show me.

Therefore, presently, any multithreaded C++ programme will at best
work on some conforming platforms, but given a random conforming
platform, there is no guarantee that it will work.

Show me.

_That_ is the problem, and that is what the standards committee is
trying to fix.

Microsoft provides multi-threaded libraries and synchronization
primitives.

There is a vast universe beyond Windows. More importantly, unless you
have control over the compiler and the OS, or unless the compiler and
the OS give additional guarantees beyond what the C++ standard
requires, it's impossible to give any guarantees at all about
multi-threaded applications, since at present anything to do with
threading is undefined behaviour as far as the C++ standard is
concerned.

Like what? The compiler honors sequence points. The OS provides
kernel-mode synchronization primitives. The hardware provides atomic
operations. These are the essential elements for building
synchronization libraries and, therefore, thread-safe code.

Can you guarantee that if I took your code and compiled and ran it on
GNU/Linux that it would compile and work? Solaris? Novell DOS? AIX?
HP/UX? Symbian? QNX? BeOS? Custom embedded systems? A platform I
created myself with the only guarantee that the C++ compiler is
conformant?

Finally, a convergence of thought (I hope). The answer is *it
depends*. The reasons is that I doubt that all the OS people have
developed as rich a set of OS synchronization primitives as Microsoft
has. Also, a few hardware platforms might simply decide not to provide
any atomic synchronization operations. On these latter platforms,
there would be nothing that anyone could do. No amount of thinking
would circumvent this issue. The answer would be "yes, provided that
you give me three things:

1. Hardware that supports atomic operations like test-and-set.
2. OS that provides kernel-mode synchronization primitives. I listed
a few before, but the essential ones are event, mutex, semaphore,
waitable timer..the basics. I would need functions to wait on these
primitives, one at a time, or as a group. The last criteria, as a
group, is _crucial_ for "feel good" for C++ programmer. IMHO, it will
be seen later that, without this last primitive, feel good drops
dramatically. I am happy Microsoft provides this. The do not provide
it on Window CE, which is unfortunate, so this would be a good example
where my library would not port, no matter how hard I tried, until the
OS people fixed this. Note that my code would still be correct - there
would simply be a hole in the platform, but that platform is in the
kernel, not in my library.

Note also that I would not ask you to do anything to C++ proper. I
would take the compiler on your system, compile my code against your
OS, and of course, use the library providing these primitives in your
OS. The library function names could look as weird as you like, as
long as the basic functionality is there to be wrapped in C++ classes.

Are you sure? I'm sure it's possible to create a conforming compiler
that will break your multithreaded code. I already gave an example of
how that would work elsethread.

I will look again, but I did not see anything that would "break" my
code. The only comment that came close to breaking my code was the
mention of cache synchronization, which I knew about, but purposely
left out so as not to allow this to thread to degenerate into a
conversation on computer hardware and esoteric contention resolution
algorithms.

If F calls G, G calls H, and H calls F, then spilling to static
memory
in F would result in a dead end. The stack has to be used.

So build a second stack in static memory. That would support recursion
just fine, and still break threading.

Of course, now you're going to say that that is a stupid thing to do

Now I was going to say it seems like you're fishing but..:)

because it breaks multithreading. But whether it's a stupid
implementation or not is not the point. The point is that it's a
_legal_ implementation with the current C++ standard, meaning that as
soon as you start using multithreading in your C++ programme, you
can't know what it means anymore. At best, you can know what it means
for a specific combination of OS, compiler and standard library. If
we go down that road then we might as well throw the standard out of
the window and create a separate language for each platform.

I disagree with this. If I can have the primitives I asked for above, I
can write portable multi-threading applications. And most importantly,
I think this can be done elegantly with libraries. A good example is
thread-cancellation with determinate-state. There is a very elegant
way to do this, and it is portable, and it requires no changes to C++.

I am. The most I have seen so far, aside from the cache coherency
problem (which has been around for a long time), is that if two
threads operate against an unprotected global variable, there is a
potential for corruption. I did not say this. Others did.

And since C++ does not provide any way to protect that global
variable, there is nothing you can portably do about it.

I can protect the global variable with a mutex. The mutex I have now
is portable. If it does not port to a platform, it is because the OS
writer did not provide a kernel-mode mutex.

Not just providing a thread-safe library, providing a thread-safe
library with rigidly specified semantics. So that programmes that use
this library don't exhibit undefined behaviour. So that you can be
absolutely certain that such a programme will behave in a certain way
when compiled with every conforming compiler on every platform.

Ok, good, so now we are talking about the C++ _library_. Just to make
sure we are all on the same page, there are two kinds of libraries to
speak of here.

1. Synchronization library (class mutex, class event, class
semaphore,..etc)
2. Standard Library

I have already stated at the beginning of this post that if you design
a component that is inherently resistant to multi-threading, for
example, a class that is "a box with 1 banana in it and 10 monkeys all
want that same banana", no amount of synchronization is going to help
you, because the class was never meant to support more than 1 thread in
the first place.

It is impossible to describe the behaviour of such a library in C++'s
current machine model.

Most of the examples I have seen so far supporting this statement has
to do with design of libraries, not C++.

Therefore, that model must be changed before
we can describe such a library. That doesn't mean that the semantics
of the language will change (in fact, they should _not_ change) or
that keywords will necessarily be added (in fact, that should be
avoided if possible), but the language in which those semantics are
expressed has to be made more expressive. And that extension has to
be generic enough to encompass all the systems out there, be
implementable on all of them, and yet it should be simple enough to
be useful.

If the OS people were to get together and figure out the parameters for
threads, whether named kernel-mode objects are universally useful (they
are heading that way), etc....then we can do our part, using a
Synchronization Library.

Elsewhere you said that you would expect students having done a
university level CompSci course at a good university to know the
basics of multithreading and synchronisation, and that you didn't
think any of us posting here knew anything about this. I _am_ such a
student. I'm currently finishing up an MSc degree in computer science
with a specialisation towards databases (and with excellent grades
too). I took (amongst others) courses on parallel processes,
concurrent and distributed programming, distributed operating
systems, and database transactions and processes. By your own
reasoning, I should know the basics of threading.

I would imagine. I cannot see why not. I *do* know that if I took _any_
of the computer scientists whom I went to school with, say 3rd or 4th
year bachelors in the United States, and presented them with some of
the, "Look what happens to this global variable when two threads run
against it..." examples, most of them give me strange looks.

If I gave them the examples of std::basic_string or map<> or whatever,
and gave them an exam and asked them what would happen if multiple
threads are operating against a local object, the first thing the C++
programmers would ask me is, "How is it implemented?" They would *not*
assume that it is thread safe, especially if I asked the question, for
they would know that the answer is probably not the obvious one -
otherwise I would not be asking the question. ;)

But most importantly, if I went to them and said, "Well, I designed
this library with a string class, and people use it, but their programs
crash because they are multi-threaded apps and the string class uses a
global variable.." there would be a few things they would want to know:

1. Did i know in advance that the string class might be used in
multi-threaded app?
2. If so, why did I design it that way, knowing that the design is
inherently single-threaded
3. Is there a way to rework it so that it is inherently multi-threaded
without protection
4. What model I have for providing protection for global instances
that is elegant for programmer.

I got this far by carefully reading what others have to say, carefully
considering what they meant and what I might be missing or
misunderstanding to make me think otherwise, and either adjusting my
mental model accordingly, or asking for clarification if I can't
figure it out. I would respectfully request that you do the same.
It's a good way to learn.

Good advice. So I ask again? Specifically, what is your vision of what
will become of C++ to support threading? Do you anticipate any changes
to the language proper (the grammar).

-Le Chaud Lapin-

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]