Re: Strange performance problem in C++ program
mike3 wrote:
Yes, the digits must be shifted because the floating
point numbers are stored with an exponent to the base
two (binary). This avoids the problem of "wobbling
precision" that plagues using the digit size as the
exponent base. Furthermore I _know_ this can be made
fast (see the end of this message for why).
IIRC, the wobbling precision issue is about one digit in your
representation which stores less information in higher bases. That was a
big deal in the 60's but I hardly see why this is important if a 4000
bit floating value effectively has only 3980 bits of real precision? In
particular if you always have to bit-shift one operand for the extra 20
bits?
The reason for the conditional is because shifting
say a 32-bit "unsigned long" type left by 32 bits
(if "Digit" is defined as "unsigned long" like it is
in my program and we're on a 32-bit machine, DIGIT_SIZE
is 32 bits -- the "digits" take up and enitre machine word),
does _not_ produce zero like you'd expect. Try this:
So, why the heck do you do this in the inner loop? What about
if (bsh==0)
{
}
else
{
}
The mystifying thing was that when I had a more "C-like"
routine -- namely, the "BigFloat" object used pure
arrays, instead of std::vector, *and* the exponent/sign
combination were not encapsulated, plus the routine
was not a member function -- it performed much better.
(The drawback though was then boundschecking for exponents
and arrays had to be all over the place (in all sorts
of different routines), dramatically killing code
readability) And there is virtually no difference between
the algorithms used, no more or less shifts.
If "digits" is a std::vector, then you should not take the address of
the first element, but an iterator of course. The encapsulation
shouldn't introduce an overhead, provided that the compiler inlines the
calls. For the transformation into a member function: Your problem might
rather be, that you are simply running out of registers. Your core
addition loop for example works with (lenght-wsh-1), sumCarry, aPtr,
bPtr, rPtr which is much for a x86. It literally carries sumCarry. Now
compare with the following approach
while(true)
{
do
{
if (i==end) goto out;
carry=add_without_carry(a[i],b[i],res[i]);
i++;
}
while (!carry);
do
{
if (i==end) goto out;
i++;
carry=add_with_carry(a[i],b[i],res[i]);
}
while(carry);
}
out:
The compiler can easily put carry in a scratch register now and the
add_* routines can be coded with ahead knowledge of carry.
Also consider a storage layout, where all the digits are *within* the
(variable sized) bignum, because you save the extra pointers for the
digits in all your computations, plus the caching aspects of this.
And learn to read the assembler output by your compiler. For bignum
addition, the limiting factor usually is the memory transfer capability
of your system.