Re: The D Programming Language
Andrei Alexandrescu (See Website For Email) wrote:
James Kanze wrote:
I don't know quite what different definitions we could be using.
Undefined behavior occurs when the language specification places
no definition on the behavior. I don't know how you can easily
search for it, because it is the absence of a definition. Java
(and most other languages) don't use the term, or even specify
explicitely what they don't specify. So the reponse is rather
the opposite: unless you can find some statement in the language
specification which defines this behavior, it is undefined
behavior.
I was hoping I'd be saved of searching online docs, but now it looks
like I had to, so so be it.
There might be a terminology confusion here, which I'd like to clear
from the beginning:
1. A program "has undefined behavior" = effectively anything could
happen as the result of executing that program. The metaphor with the
demons flying out of one's nose comes to mind. Anything.
The example is meant to be taken humorously. Surely you don't
think that the C++ standard would be improved, and that we would
have eliminated all "undefined behavior", in any useful,
realistic sense, if we added a clause to the standard saying
that "in no case is a program allowed to cause demons to fly out
of the programmers nose."
In practice, "undefined behavior" is always somewhat restricted;
in non-privileged code under Unix or Windows, for example, you
may get a core dump, but you won't corrupt the system or even
reformat the hard drive. The C++ standard prefers to not give
even these guarantees, because C++ is conceived for use in areas
where they don't apply---if you have undefined behavior in a
device driver, you might end up reformatting the hard disk.
Java can make concrete, specific limits, because it doesn't try
to be usable in such contexts. From the point of view of
someone developping application (non-privileged) software, C++
has some limits as well. That doesn't mean that it doesn't have
undefined behavior in such cases, at least not for any useful
meaning of the expression.
2. A program "produces an undefined value" = the program could produce
an unexpected value, while all other values, and that program's
integrity, are not violated.
In practice, in real programs, it's much more complicated.
"Values" interact, and the results of modifying values in the
wrong order, and seeing those modifications, can result in
behavior that the programmer cannot foresee. Not limited to
unexpected values, but including unexpected exceptions, etc. If
you violate the rules in Java, you cannot count on much in
practice, any more than if you violate them in C++. (You can
count on NOT getting a core dump, of course. Which I would
consider a defect, more than an advantage.)
The two are fundamentally different because in the second case you can
still count on objects being objects etc.; the memory safety of the
program has not been violated. Therefore the program is much easier to
debug.
Memory safety is only one part of "undefined behavior". Not
crashing when you have a serious error makes the program much
harder to debug---if there's a weakness here in C++, it's that
the crash is not guaranteed, not that it isn't forbidden. (But
pratically speaking, guaranteeing the crash in such cases is not
implementable at reasonable cost.)
C++ allows programs with (1). We might also consider that it allows
programs with (2) under the name of "unspecified behavior" or
"implementation-dependent behavior". (There would be a subtle difference
there, but passons.)
There's a radical difference. As a pratical programmer, there's
really not any significant difference between "unspecified
behavior" and "undefined behavior", unless there are serious
restrictions on "unspecified". Whereas I use implementation
defined behavior in just about every program I write.
My current understanding is that Java programs never exhibit (1),
If you mean that Java guarantees that a program will never make
demons fly out of your nose, you're probably right. If you mean
that the program will behave in a reliable and predictable
manner regardless of what I've coded, you're definitely wrong.
The question is, I think, just how unreliable and unpredictable
it has to be before we speak of "undefined behavior". I would
say that there are certain cases involving threading where the
behavior is so unreliable and unpredictable that I would
consider it "undefined". Whether you agree with the actual word
is really not the issue---the point is that for a practical
programmer, you're faced with the same issues.
(Don't get me wrong---I think there is far too much of this
problem in C++, and Java does handle it significantly better.
The only cases I can think of in Java where it is a problem do
involve threading, which is an extremely complex issue; in C++,
you can get similar problems with even the simplest, single
threaded code, e.g. by returning a pointer or a reference to a
local variable. Just because I refuse to accord Java the
absolute doesn't mean that I don't recognize that it represents
orders of magnitude improvement in most cases.)
and
might exhibit (2) only on values that can't be read atomically (which
remarkably are never pointers).
To find out whether my understanding is
correct, I looked up the language spec, which says after a discussion of
the memory model (see
http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.3):
"Therefore, a data race cannot cause incorrect behavior such as
returning the wrong length for an array."
Which is a true, but it is a useless guarantee. I can get the
wrong length from a java.util.Vector.
The possibly useful guarantee is that if I use the wrong length,
I still have defined behavior. It would be even more useful if
the guarantee was sensible; if the code was guaranteed to crash,
instead of just throwing an exception which can be caught and
ignored. (At least in my field of endevour. I can quite
understand that there are cases where the exception, if it is
caught at a high enough level, might be appropriate. The trick
would be to define a type of exception which can only be caught
at a high enough level, so that lower level code can't mask its
errors and return wrong results.)
Later on that page, there is a section "17.7 Non-atomic Treatment of
double and long" that discusses the exact issue we are talking about here.
"Some implementations may find it convenient to divide a single write
action on a 64-bit long or double value into two write actions on
adjacent 32 bit values. For efficiency's sake, this behavior is
implementation specific; Java virtual machines are free to perform
writes to long and double values atomically or in two parts.
For the purposes of the Java programming language memory model, a single
write to a non-volatile long or double value is treated as two separate
writes: one to each 32-bit half. This can result in a situation where a
thread sees the first 32 bits of a 64 bit value from one write, and the
second 32 bits from another write. Writes and reads of volatile long and
double values are always atomic. Writes to and reads of references are
always atomic, regardless of whether they are implemented as 32 or 64
bit values.
VM implementors are encouraged to avoid splitting their 64-bit values
where possible. Programmers are encouraged to declare shared 64-bit
values as volatile or synchronize their programs correctly to avoid
possible complications."
This section can be understood only if we know what a Java program does
once it's read an invalid (say, NaN) value. Will it crash?
Can the VM avoid crashing, if the OS decides that that is what
it wants to do?
More to the point, does the fact that a Java program cannot
crash (IF that is the case) mean that Java has no undefined
behavior, or is it more or less a specious guarantee, with about
as much meaning as if C++ added a guarantee that no C++ program
could make demons fly out of your nose. Do my programs suddenly
loose all undefined behavior if I set SIGILL, SIGBUS, SIGSEGV
and SIGFPE to ignore at the start?
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]