Re: abstract classes and dynamic binding

From:

Tom Anderson <twic@urchin.earth.li>

Newsgroups:

comp.lang.java.programmer

Date:

Wed, 30 Jul 2008 12:48:59 +0100

Message-ID:

<Pine.LNX.4.64.0807301136160.1661@urchin.earth.li>

On Tue, 29 Jul 2008, Peter Duniho wrote:

On Tue, 29 Jul 2008 11:51:59 -0700, Mark Space <markspace@sbc.global.net>
wrote:

Peter Duniho wrote:

On Mon, 28 Jul 2008 19:35:55 -0700, Stefan Ram <ram@zedat.fu-berlin.de>
wrote:

conrad <conrad@lawyer.com> writes:

How does the JVM know that

Unless specifically compiled with run-time type information (RTTI), C++
has no RTTI. And yet, it handles abstract classes just fine.

Well, the OP's question was specifically about the JVM. I think that's
what Stefan was answering. The Java (spec? byte code spec?) system
does store the type of all objects in their .class files. (Notice
Stefan never said "RTTI.").

I'm aware that all that information is available, and even said so in my
reply.

And that is how a JVM does it's virtual dispatch.

"What" is "how"?

I don't think a JVM uses virtual method tables. I guess it might, post
JIT compilation,

Tangentially, i find this a slightly odd way of looking at things: "A JVM
does such-and-such. Except after JIT compilation.". That implies that
before compilation is in some sense the 'normal' situation, and after
compilation is a special case, whereas i'd say it was the other round.

Indeed, i believe that the IBM VM doesn't do interpretation at all, and is
all-compiled - they have two compilers, the full-on one that's the
equivalent of Sun's compiler, and then a special one which produces crappy
code but runs really fast, which is used for initial compilation. The idea
is that the initial compiler is fast enough that compiling a method and
then running the compiled code is as fast as interpreting it. Doing this
simplifies their execution model a lot.

So, anyway, i'd say that JVMs do (AFAIK) use vtables, but might not, pre
compilation. But ultimately, this is just a terminological difference.

but it derives that info from the class types, so ultimately I think
Stefan's answer is right on.

I can't imagine _not_ using v-tables for the JIT-ed code. Looking things up
in the type itself would be comparatively so slow, it hardly seems like the
right approach. I can see how when the byte code is being interpreted, that
might be a more viable solution though.

Right. But the point Mark and i are making is that a vtable *is* run-time
type information.

If Stefan has some specific knowledge about every JVM that means that
they cannot use v-tables either during interpreted execution or in the
JIT-ed code, I'd love to know about it. It would be a good learning
experience for me. But I find it difficult to believe that Java is
sifting through the actual type information just to dispatch virtual
function calls, at least when executing JIT-ed code.

I'm not aware of a JVM that doesn't use vtables in compiled code. However,
i believe there are/were Smalltalk VMs that didn't. Smalltalk is an
untyped ('dynamically typed' in modern jargon) language, so, in the
absence of type inference, nothing about the type of a receiver is known
at a call site, and that means you can't use vtables. Vtables are based on
the idea that whatever the receiver is, it must be a subclass of the
declared class of the variable, and thus the part of its vtable that the
call site might use is laid out in a predictable way. So, no types, no
vtables. The general mechanism that these VMs used was, i believe, a
lookup in the class data structure - even in compiled code.

However, since this is quite slow, there was a cunning trick: inline
caches, and for the truly wizardly, polymorphic inline caches. Basically,
the idea is that the call site stores two bits of information: a pointer
to a class, and a pointer to the right method in that class. So, if you
have a call site that does foo.bar(), you might have pointers to the class
Baz and the method Baz.bar. Then, when execution reached the call site,
the machine code does the equivalent of:

fooClass = CLASS OF foo
IF (fooClass == Baz):
INVOKE Baz.bar WITH RECEIVER foo
ELSE:
barMethod = LOOKUP "bar" IN fooClass
INVOKE barMethod WITH RECEIVER foo

Which is pretty quick. Also, it gives the compiler the opportunity to
inline the Baz.bar call.

If your compiler is smart enough to accurately predict the receiver type
at compile time, this is straightforward and effective. If it can't, then
you need to manage this cache at runtime, and this is where it gets hairy.
You could do it easily by storing the class and method pointers in
variables somewhere, and updating them on cache misses. However, what was
actually done was to write them into the code, and to recompile on cache
misses!

This approach could then be extended to handle polymorphic call sites,
where the receiver type varies enough to defeat the above strategy. There,
you use a polymorphic inline cache: a series of if-then constructs that
check for various possible types. If you get a cache miss, you recompile
to add another one to the list. You have a limit on how many options you
have, to prevent ballooning of the code at call sites which range over a
vast number of receiver types, eg obj.toString() in
PrintStream.print(Object obj).

Clever, eh?

tom

--
A military-industrial illusion of democracy