Re: "Where is my C++ replacement?"
On 17.06.14 23.18, Lynn McGuire wrote:
It would be nice if multithreading was totally
automated in C++ like Golang. The programmer
should be hidden away from the aspects of
multithreading for application programs.
This sooner or later always ends up in a performance disaster.
The optimizer cannot always predict the timing of the outstanding tasks.
And in some cases it will sadly fail. E.g. if the tasks are IO bound,
maybe by a remote server. In this case the optimizer will erroneously
assume that the systems resources are not yet exhausted and spawn
further threads. This may end up at the remote server in something like
a DOS attack. It may run out of resources and slow down or abort.
Going this way will end up in software that does not break your server
if it runs alone. That's the point where the developer will stop
optimizing. But running many software of this kind in parallel will fail
if each software is just designed to eat up all available system resources.
Let's give you an example from the Java language to illustrate the
point. It's about memory rater than CPU (and parallelism) but it shows
the principle. Java does the memory management for you. The solution is
the GC thread, that cleans up unused memory. This thread always runs at
lower priority, until the VM is close to out of memory. So everything is
fine for a desktop application. But let's have a look at a server with
heavy load or mass data processing.
As long as there are jobs to do the JVM will not recover memory unless
there is no more memory. Doing this with several JVMs in parallel will
end up in all JVMs consume the maximum allowed memory all the time even
if only a small part of it is really in use. If this is significantly
more than the physical memory of the machine you end up with swapping
and the server almost stops working. All you can do is to restrict each
of the n VMs to use no more than the n-th part of the systems memory.
Unfortunately this has the side effect that the JVM can no longer get
more memory for short times. So only one larger transaction causes the
the JVM to crash since memory exceptions cannot be reasonably handled by
Java. (This is up to other programs written in other languages, e.g
database rollback in C++.) At least it gets incredibly slow because the
GC needs to clean up all the time.
BTDT. A Java based integration server with 32 GB memory and 8 CPUs got
stuck with the processing of one larger data block (~20MB) because one
JVM runs out of memory (max 512 MB per JVM because of the above
problem). A C++ based integration platform processed the same data in
about 15 seconds on a consumer class Notebook with 1 GB RAM. Because of
the better memory management it did not even take 512 MB to do the
processing. It took about 60 MB and only for a few seconds.
Furthermore the language concept of C++ does not allow unrestricted
parallelism. Memory access in C++ is full of race conditions. Only
strict functional languages do not share this problem. So the (hidden)
optimizer has to synchronize access all the time. Synchronization takes
on modern architectures more time than computing smaller things twice.
Because of Einstein's rules you cannot synchronize things that are more
that about 10 cm away space within a 1 GHz clock cycle. Atomic operation
(lock free algorithms) sometimes come pretty close to this limit. Mutex
based algorithms (the major part) are an order of magnitude slower but
more predictable.
And one of the principles of C++ is, that you can write code with almost
no unnecessary runtime overhead while keeping type safety and
maintainability of the code. This conflicts with your requirements.
Once you give things out of your hands you need to have sufficiently
large resources to deal with the case when it does not work best. The
average case may be improved by the automatic resource handling, but the
worst case is not. But the worst case performance is responsible for a
significant amount of the costs, i.e. mainly sizing cost and/or incident
costs in this case.
On the other side you may save some money if the developer has not to
deal with that much complexity. But this turns immediately into the
opposite as soon as the developer has to cure only a few problems caused
by this infrastructure. Because the program has less control over
resource management this kind of problems are really difficult to fix.
The break even is usually at the point where system resources are really
cheap and this even holds true if you application uses ten times as much
at some operations. In this case the saved developer money and time is
the dominating factor. But I know by far more examples the other way
around where the project guys always hurried up and done compromises to
save some time or money, and at the end they paid twice as much because
it did not perform well and late changes need to be done, preferably in
production with regress.
I think at some time we will come to the point that you wish. Whether
this as still called C++ is another question.
But it is very important that the language has a expression syntax that
tells the compiler about all these dependencies between different data.
This is essential for the compiler to do a good job. Most of the current
languages are not fully expressive at this point. This is also the
reason why no reasonable cross compiler exists, that can transform
multi-threaded programs from one language to another without functional
changes. (The exist for single threaded applications.)
I.e. the language have to give variables transaction scopes. E.g local
variables that may not be accessed by another thread. (The operating
system may support this by allocating the memory private to a thread at
the cost of larger thread switch times.) Or a set of shared variables
that are modified only synchronized and transactional. (An exception
causes a complete rollback.) But this includes /all/ referenced objects
too unless they are declared to be immutable (and therefore implicit
thread-safe) or explicit thread safe. These are only examples and for
sure not complete. But the C++ language provides non of these features
needed for automatic thread management so far.
The Occam (1980's) language had some of this restrictions. E.g variables
needed to be thread private or read only. This was checked by the
compiler. But it was a really ugly language. So I have preferred C to
program the Transputers (where start thread was one machine instruction)
and covered the race conditions by myself.
Another necessary change when going your way is to split the bulky large
applications in small state engines that communicate only over
documented and verifiable interfaces. This is a SOA-like approach. I
have done this so far in the 90'. Surprisingly this was in no way
complex. The code of most of the functional units fit to one or two
screens. And race conditions are almost gone. But each unit was a thread
and most OS cannot reasonably deal with thousands of (sleeping) threads.
On this hardware a sleeping thread did not consume any resources except
for it's (small) stack memory.
Marcel