Re: Identity of bytecode
Mike Schilling wrote:
Daniel Pitts wrote:
It is also possible that different versions of the Java compiler
will
produce slightly different byte-code for the same source, however
the
same compiler should produce the same output for the same input.
Probably, but there are no guarantees. "Here's a really sparse switch
statement. I could produce either a tableswitch or a sequence of
if_icmpeq's. I'll randomly select one or the other." propably isn't
the world's best compiler design, but it's perfectly legal.
Funny, that sounds just like the design of the Java compiler - the
bytecode-to-machine-code compiler that routinely makes such decisions at
runtime, except that the choice isn't actually random.
Seems like a pretty good design - it makes for all sorts of lovely
optimizations, and perhaps more importantly, de-optimizations as the situation
demands, and has significantly improved Java's run-time performance since its
institution.
Up until now we've only talked about compilation to bytecode, which is
relatively static, and that does raise the rather interesting question. Could
the Java compiler's choice of bytecode, say for a tableswitch or not as above,
preclude the JVM's HotSpot compiler from making some wiser choices?
Or perhaps there is some flexibility even there, where the same bytecode (say
the if_icmpeq series) might turn into the local CPU equivalent of a
tableswitch due to the JVM's alertness, or not, dynamically, at different
times in the same program run.
How sophisticated are HotSpot's optimizations these days? While I'm having
some trouble getting a read on how very clever hotspotting is in real life,
compared to what the white papers say it should be, Evidence is that something
is doing something.
I modified some old Linpack benchmark code of the net the other day, to make
it run multiple times in a loop, reporting its results for each loop of, say,
100-by-100 matrix calculations. Over ten iterations the reported speed
improved step by step from "20" to "686", using whatever the program thinks it
measures.
Setting -client instead of -server seems to dampen that a little. I'm
guessing that the acceleration is the result of hotspotting, but I haven't
ruled out cache or other effects yet. It's possible the optimizer lifted out
entire loop bodies due to lack of side effects, thus both validating and
invalidating the benchmark results.
--
Lew