Re: Hash table performance
Jon Harrop wrote:
My guess is that this is because the JVM is boxing every floating point
number individually in the hash table due to type erasure whereas .NET
creates a specialized data structure specifically for a float->float hash
table with the floats unboxed. Consequently, the JVM is doing enormously
numbers of allocations whereas .NET is not.
Is that correct?
No, not primarily. In my testing I found:
1. I think the JVM itself makes the most difference.
2. Next was synchronization.
3. Autoboxing appears to have an impact but it's less than either #1 or #2.
The first time I ran your program, it took over 79 seconds to execute.
During the course of testing the other algorithms, that time reduced to
about 13 seconds and stayed there. Very odd, but I assume that
something somewhere optimized the code.
Unlike a lot of the posters here, I'm running a 32 client JVM, which is
slow to optimize code. 64 bit systems run in -server mode by default,
which optimizes code before executing it.
The first place to look is your environment: memory, flags to the JVM,
which JVM version and vendor, other "system" variables, etc.
Somewhat surprising to me, just substituting a a HashMap for the
HashTable reduced the time to about half -- 6 seconds instead of 13.
Both autobox, the only difference is synchronization.
Lastly, removing the autoboxing by writing a specialized class saved
about 0.7 second from just using HashMap. The specialized version time
was 5.3 seconds. Note that I allocated a new object myself -- the
"Entry" for the HashMap -- each time a new double is added. I suspect
that any reasonable C# hash object must do the same, so you're not
losing as much time as you think by Java's auto object creation.
I'm curious to see your C# version, and see actual times for both
programs on your system. The results might tell us what is going on.
"32x" doesn't really give us much of a clue.
Output of the following:
<output>
run-single:
HashTable time: 12844
hashtable(100.0) = 0.01
HashMap time: 6016
hashtable(100.0) = 0.01
HashTest time: 5373
hashtable(100.0) = 0.01
BUILD SUCCESSFUL (total time: 29 seconds)
</output>
<code>
package cljp;
import java.util.HashMap;
import java.util.Hashtable;
import static java.lang.System.out;
public class HashTest
{
public static void main( String[] args )
{
Hashtbl.test();
HashM.test();
HashTest.test();
}
HashMap<Entry,Entry> hash = new HashMap<Entry,Entry>();
private static class Entry {
final double key;
final double value;
public Entry( double key, double value )
{
this.key = key;
this.value = value;
}
@Override
public int hashCode()
{
long bits = Double.doubleToLongBits( key );
bits ^= bits >>> 32;
return (int)bits;
}
@Override
public boolean equals( Object obj )
{
if( !(obj instanceof Entry ) ){
return false;
}
return key == ((Entry)obj).key;
}
}
public void put( double key, double value ) {
Entry e = new Entry(key, value );
hash.put( e, e );
}
public double get( double key ) {
Entry e = new Entry( key, 0.0 );
Entry valueEntry = hash.get( e );
if( valueEntry != null ) {
return valueEntry.value;
} else {
throw new IllegalArgumentException("Not found: "+key);
}
}
public static void test()
{
long start = System.nanoTime();
HashTest hashTest = new HashTest();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashTest.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashTest time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashTest.get( 100.0 ) );
}
}
class HashM
{
public static void test()
{
long start = System.nanoTime();
HashMap<Double,Double> hashM = new HashMap<Double, Double>();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashM.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashMap time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashM.get( 100.0 ) );
}
}
class Hashtbl
{
public static void test()
{
long start = System.nanoTime();
Hashtable hashtable = new Hashtable();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashtable.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashTable time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashtable.get( 100.0 ) );
}
}
</code>