Re: Memory Allocation in Java
Eric Sosman wrote:
Christopher Smith wrote:
Patricia Shanahan <pats@acm.org> wrote in
news:Ha7Ig.1997$bM.233@newsread4.news.pas.earthlink.net:
Eric Sosman wrote:
Christopher Smith wrote:
Hi All -
Problem: I have a large array of floating point numbers I need to
look for. These results come from a brut-force grid search, where
the coordinates (x,y) are non-parametric test results.
The problem is that the length of x and y, and thus the size of the
grid is quite large. The length is a minimum of 120,000 both
directions on the grid, for a total of 14,400,000,000 possible
combinations. Which obviously consumes a lot of memory -- somewhere
on the order of 500 MB, if 32-bit floating point.
... for suitable values of "somewhere on the order of."
120000 * 120000 * 4 = 57600000000 ~= 55000 MB ~= 54 GB. Are
you sure the dimensions you've given are correct?
If the dimensions are correct, I hope you have a 64-bit
JVM and a pretty substantial machine to work with.
Good point. I didn't check the arithmetic on the memory size.
This raises a whole different set of issues. Maybe brute force is not
the way to go.
How many of the elements of the matrix are non-zero? Maybe this is a
case for sparse matrix techniques?
If dense, it might be better to keep it on disk. That raises a whole
set of issues of whether it is possible to batch and sort updates to
the matrix to reduce the amount of I/O.
Patricia
Thanks for straightnening out my math.
I do have horsepower, but don't want to tie up that many resources.
No, sparse matrix math won't work. Every field has a value.
It's a startlingly large number of values; may I ask where
they all came from? Just curious, really.
Amusing factoid: There are about 3.4 times as many fields
as there are distinct `float' values.
I guess divide and conquer is the right way to go. What I can do is
splice the grid-search into quadrants, process each quadrant, report and
record the quadrant results (i.e., I'm searching for the Max within the
grid). From there, it's just a matter of rolling through the quadrants.
Could you explain the nature of this search a little more?
Simply "searching for the Max" in a big collection of numbers
requires very little memory; there's no need to retain a number
that's known to be non-maximal.
Indeed. If that is the typical operation, I would just stick the numbers
in any convenient order in a disk file, and scan it sequentially,
keeping the max so far and its location.
For many other operations, there are known out-of-core algorithms that
are designed to do as much work as possible on a chunk while it is in
memory. Don't assume that the right algorithm if the problem fits in
memory is necessarily the right algorithm if it doesn't.
Patricia