Re: Optimization problem

From:

Tom Anderson <twic@urchin.earth.li>

Newsgroups:

comp.lang.java.programmer

Date:

Thu, 30 Apr 2009 23:07:40 +0100

Message-ID:

<alpine.DEB.1.10.0904302237460.26990@urchin.earth.li>

On Wed, 29 Apr 2009, tnorgd@gmail.com wrote:

I would like to gather some opinion about a specific problem, how
should I store data in my program.

In general, this is a package for geometric manipulations. The basic
and heavily used class is:
public class Point {
public final double x,y,z;
public final int pointId;
// and many methods there, for example:
public double distanceTo(Point anotherPoint);
}

Points are gathered in shapes and these form a stage. This hierarchy
is very natural for my problem since I need to move shapes, check
which shape a given point belongs to, etc. So far everything works
fine. But now we come to the problem.

Important part of the program are bulk operations, e.g. center of
mass, which is a simple arithmetic average of all points. The data
structure described above is very slow for that purpose.

I don't see how you can possibly say it's "very slow". That implies you've
measured its speed, and either you've measured the speed of alternative
implementations, or you have a sensible idea of how fast it should be.

I have nested loops iterating over thousands of objects. So far I got
the following solution:

public class Point {
public final double[] coordinates; // <--- There we go!
// ....
}

And somewhere else I have a global array: double[N][3], where N is the
number of points, known in advance. Each Point class actually has a
reference to a single row from this array. Then a method performing a
bulk operation on points take this array as an argument. Some
preliminary tests show that this is much faster, but I couldn't say how
much without rewriting the whole code...

I'm surprised to hear that. And, in fact, some benchmarks of my own don't
have the same conclusion - see below.

SOME FACTS:
- number of points N is known in advance.
- Typical stage has 300 - 1000 shapes, around 10-20 points per shape.
The big array I am thinking of should take ~300 kB. Currently I have
to iterate through 10 000 objects

QUESTIONS:
- what do you think in general about the global array Nx3 idea?
- I was told that memory paging is a very important factor when
efficiency is concerned (and that is one of my problems: my 10 000
objects have huge memory overhead). Do you think 300kB array of
doubles is small enough?
- I am worried that when I allocate my double[N][3] array, my

Paging won't be a problem. Cache interactions might be.

coordinates (of type double[3]) will be sparsely allocated in memory
and I don't earn anything.

There's nothing in the spec that says they won't be, but in practice, they
won't be. Provided that you allocate them all at once.

In C++ I would put all coordinates in a single double vector double* of
size 3xN. Then my coordinate object would be also double*, pointing to
somewhere in the middle of the allocated memory. I could do something
similar in Java:
public class Point {
private final double[] memoryChunk; // reference to all coordinates,
of size 3xN
private final int offset; // Address of this point in the global
array
public double getX() { return memoryChunk[offset]; }
public double getY() { return memoryChunk[offset+1]; }
public double getZ() { return memoryChunk[offset+2]; }
// ....
}
Do you thing I will gain anything from this?

This will take up less memory than the two-dimensional array, and
more or less guarantees compactness.

I wrote a simple benchmark to compare naive points, points represented as
an N x 3 array of arrays, and points represented as a single array:

http://urchin.earth.li/~twic/tmp/PointBench/

I used a single shape with a million points, so perhaps not so similar to
your situation. But the results, in nanoseconds, are:

minimum median 95th centile
naive 111049000 124352000 185673000
N x 3 131850000 149605000 206983000
packed 96216000 106892000 146708000

So, a packed representation is ~20% faster than a naive representation,
whilst the N x 3 representation is actually slower.

However, that's only when the points are filled in in order. I wouldn't
expect this to matter for the N x 3 or packed representations, but for the
naive points, allocating all the points in order has the effect of making
them contiguous and sequential in memory. I don't think this is realistic:
i presume points in your application will actually be allocated in a more
dispersed order, some over here, then some over there. So, if we instead
fill in points in a random order, so that they aren't, we get:

minimum median 95th centile
naive 136890000 151244000 268689000
N x 3 131995000 148993000 201279000
packed 96196000 108204000 160004000

For the array representations, the times are basically identical. However,
for the object representation, there's a ~20% slowdown.

So, the conclusion is that the packed representation is at least 20%
faster than the naive object representation, and probably more like 40%
faster in a realistic situation. However, the N x 3 representation is
never faster.

But, crucially, note the spread of values in the above data: in all cases,
the width of the range of times greatly exceeds the difference between the
medians. Thus, differences in speed between different implementations may
not really be detectable in real use.

Obviously, this simple model may not apply very well to your code. You
have a lot of small shapes rather than one big one, so the gains from
organising shapes like this may be small. You'd have to apply one of these
approaches to a whole stage, as you indicated above. Nonetheless, i
imagine the general conclusion will be similar.

tom

--
I prefer gin now sleep doesn't want me anyway.