Bignums, object grind, and garbage collection

From:

John Ersatznom <j.ersatz@nowhere.invalid>

Newsgroups:

comp.lang.java.programmer

Date:

Fri, 15 Dec 2006 21:58:40 -0500

Message-ID:

<elvndo$2or$1@aioe.org>

I'm working on a Java project in which I'm wrangling with three
intertwined issues: bignum arithmetic, a need to optimize memory use/gc
behavior, and a need for speed. As such, I've got the following apparent
options, which each have caveats as well as good points:

1. Use BigInteger and BigDecimal
    Pros:
    * Already done for me.
    * Clean and safe.
    * Using code will be easy to understand by others.
    * 100% Pure Java; platform independent.
    Cons:
    * Since they're immutable, lengthy calculations will
      generate large numbers of "new BigFoo" and heaps of
      discarded objects for the GC to process.
    * I'm not sure the implementation is as fast as
      possible. In particular, it doesn't seem to be
      documented anywhere whether these classes switch to
      Karatsuba multiplication and then to FFTs at
      appropriate bit lengths or just stick with slow, O(n^2)
      long multiplication throughout.
2. Use something like the GNU math library via JNI; there are
    libraries out there that use sensible multiplication
    implementations at large bit sizes.
    Pros:
    * Already done for me save creating the wrapper classes
    * I can probably make my bignums mutable, so instances
      can be assigned to and otherwise recycled reducing
      newing and GCing.
    * Fast multiplies.
    Cons:
    * Non-immutable objects will make the using code harder
      to understand and more error-prone.
    * Not using BigDecimal and BigInteger will make the using
      code less clear.
    * Going through JNI is slow, so some inner loops would
      also have to be implemented in C and called down through
      JNI.
    * Not 100% Pure Java; will require compiling some C and
      some Java to build on any given box. The C can at least
      be highly portable, since all the UI/graphics/whatever
      would be done in Java.
3. Roll my own mutable bignum classes, presumably in Java.
    Pros:
    * I can definitely make my bignums mutable, so instances
      can be assigned to and otherwise recycled reducing
      newing and GCing.
    * I can use fast implementations for various bit sizes.
    * I can avoid JNI and have 100% Pure Java, platform
      independent.
    Cons:
    * Non-immutable objects will make the using code harder
      to understand and more error-prone.
    * Not using BigDecimal and BigInteger will make the using
      code less clear.
    * A lot of work to do myself, some of it doubtless wheel-
      reinventing.
    * May not be as fast as native code.
    * May not be as fast as BigDecimal and BigInteger for
      small bit lengths until the effects of object churn are
      felt in the form of lots of GC CPU use and a process
      size in the nine-figures.
    * Implementing operations will be tricky with no easy way to
      trap overflows. For efficient adds, there's a fairly
      obvious method using arrays of longs with 63 bits of bignum
      per long, using the sum becoming negative as a carry flag.
      OTOH, multiplies look like they'd prefer using ints with
      31 bits per int and intermediate longs, for the basic long
      division multiply or Karatsuba. (FFT would require a wholly
      different implementation and I'd cross that when I came to it;
      my early priority would be getting a working bignum using just
      long multiplication.)
4. Find some library or code that resulted from someone else doing 3.

The tradeoffs among the above seem to depend at least partially on the
vagaries of the VM, and in particular of the GC. It's not hard to
imagine a VM implementation smart enough to recycle objects of a
frequently used class under the hood. It still seems unlikely that it
can be as efficient as explicitly freeing or assigning to the bignums
under some circumstances.

My determinations so far are:
* Using JNI for anything is a last resort.
* The choice is really down to mutable bignums vs. BigInteger
   and BigDecimal.

If the bignums need to be mutable to get efficiently running code that
doesn't bloat up the process to hundreds of megs and spend more time in
GC than in my own code, then option 1 is out. On the other hand, if
getting decent multiplies at large bit sizes isn't going to happen with
BigFoo, then option 1 is out and I may as well make the darn things mutable.

Since I'm not aware of any pre-existing code suitable for 4, and JNI is
a last resort, it's basically down to 1 vs. 3.

What I'd like is:
a) People to weigh in on bignums and garbage collection:
    * Would code that generates billions of intermediate
      values during calculations but keeps none of them
      for very long be seriously bogged down and the
      memory requirements bloated by using the immutable
      bignums?
    * Are the multiplies in the standard BigFoos inefficient
      when the values are hundreds or even thousands of bits
      long?
b) In case either answer above is "yes" and I need to make my own:
    * How best to go about implementing the basic, non-FFT bignum?
      Use ints or longs? Tips on implementing multiplication
      (and subtraction!)?
    * What are sensible thresholds for switching implementations,
      in terms of bit lengths, to keep multiplies fast?
    * How best to implement a BigDecimal-alike given a
      BigInteger-alike?

Thank you.