Re: Memory Allocation in Java

From:

Patricia Shanahan <pats@acm.org>

Newsgroups:

comp.lang.java.help

Date:

Sun, 27 Aug 2006 15:12:13 GMT

Message-ID:

<hDiIg.10823$Qf.3838@newsread2.news.pas.earthlink.net>

Eric Sosman wrote:

Christopher Smith wrote:

Patricia Shanahan <pats@acm.org> wrote in
news:Ha7Ig.1997$bM.233@newsread4.news.pas.earthlink.net:

Eric Sosman wrote:

Christopher Smith wrote:

Hi All -

Problem: I have a large array of floating point numbers I need to
look for. These results come from a brut-force grid search, where
the coordinates (x,y) are non-parametric test results.

The problem is that the length of x and y, and thus the size of the
grid is quite large. The length is a minimum of 120,000 both
directions on the grid, for a total of 14,400,000,000 possible
combinations. Which obviously consumes a lot of memory -- somewhere
on the order of 500 MB, if 32-bit floating point.

... for suitable values of "somewhere on the order of."
120000 * 120000 * 4 = 57600000000 ~= 55000 MB ~= 54 GB. Are
you sure the dimensions you've given are correct?

If the dimensions are correct, I hope you have a 64-bit
JVM and a pretty substantial machine to work with.

Good point. I didn't check the arithmetic on the memory size.

This raises a whole different set of issues. Maybe brute force is not
the way to go.

How many of the elements of the matrix are non-zero? Maybe this is a
case for sparse matrix techniques?

If dense, it might be better to keep it on disk. That raises a whole
set of issues of whether it is possible to batch and sort updates to
the matrix to reduce the amount of I/O.

Patricia

Thanks for straightnening out my math.
I do have horsepower, but don't want to tie up that many resources.

No, sparse matrix math won't work. Every field has a value.

It's a startlingly large number of values; may I ask where
they all came from? Just curious, really.

Amusing factoid: There are about 3.4 times as many fields
as there are distinct `float' values.

I guess divide and conquer is the right way to go. What I can do is
splice the grid-search into quadrants, process each quadrant, report and
record the quadrant results (i.e., I'm searching for the Max within the
grid). From there, it's just a matter of rolling through the quadrants.

Could you explain the nature of this search a little more?
Simply "searching for the Max" in a big collection of numbers
requires very little memory; there's no need to retain a number
that's known to be non-maximal.

Indeed. If that is the typical operation, I would just stick the numbers
in any convenient order in a disk file, and scan it sequentially,
keeping the max so far and its location.

For many other operations, there are known out-of-core algorithms that
are designed to do as much work as possible on a chunk while it is in
memory. Don't assume that the right algorithm if the problem fits in
memory is necessarily the right algorithm if it doesn't.

Patricia