Re: Identity of bytecode
 
Mike Schilling wrote:
Daniel Pitts wrote:
It is also possible that different versions of the Java compiler 
will
produce slightly different byte-code for the same source, however 
the
same compiler should produce the same output for the same input.
Probably, but there are no guarantees.  "Here's a really sparse switch 
statement.  I could produce either a tableswitch or a sequence of 
if_icmpeq's.  I'll randomly select one or the other." propably isn't 
the world's best compiler design, but it's perfectly legal. 
Funny, that sounds just like the design of the Java compiler - the 
bytecode-to-machine-code compiler that routinely makes such decisions at 
runtime, except that the choice isn't actually random.
Seems like a pretty good design - it makes for all sorts of lovely 
optimizations, and perhaps more importantly, de-optimizations as the situation 
demands, and has significantly improved Java's run-time performance since its 
institution.
Up until now we've only talked about compilation to bytecode, which is 
relatively static, and that does raise the rather interesting question.  Could 
the Java compiler's choice of bytecode, say for a tableswitch or not as above, 
preclude the JVM's HotSpot compiler from making some wiser choices?
Or perhaps there is some flexibility even there, where the same bytecode (say 
the if_icmpeq series) might turn into the local CPU equivalent of a 
tableswitch due to the JVM's alertness, or not, dynamically, at different 
times in the same program run.
How sophisticated are HotSpot's optimizations these days?  While I'm having 
some trouble getting a read on how very clever hotspotting is in real life, 
compared to what the white papers say it should be, Evidence is that something 
is doing something.
I modified some old Linpack benchmark code of the net the other day, to make 
it run multiple times in a loop, reporting its results for each loop of, say, 
100-by-100 matrix calculations.  Over ten iterations the reported speed 
improved step by step from "20" to "686", using whatever the program thinks it 
measures.
Setting -client instead of -server seems to dampen that a little.  I'm 
guessing that the acceleration is the result of hotspotting, but I haven't 
ruled out cache or other effects yet.  It's possible the optimizer lifted out 
entire loop bodies due to lack of side effects, thus both validating and 
invalidating the benchmark results.
-- 
Lew