Re: Which is a better way to implement this algorithm?
On Apr 20, 1:43 am, "Ivan Vecerina"
<_INVALID_use_webfo...@ivan.vecerina.com> wrote:
"mike3" <mike4...@yahoo.com> wrote in message
news:0ab3cfe6-5a32-4986-a203-f7a2f29026fb@q24g2000prf.googlegroups.com...
: Hi.
: (Xposted to both comp.lang.c++ and comp.programming since I've got
: questions related to both C++ language and general programming)
:
: I've got the following C++ code. The first routine runs in like 65% of
: the time of the second routine. Yet both do the same thing. However,
: the second one seems better in terms of the way the code is written
: since it helps encapsulate the transformation in the inner loop better
: making it easier to read, at least in my opinion. Maybe it's not,
: maybe the former is "better" that way and I should go with it, but if
: the latter is "better" in that respect should I just ditch it anyway
: and tradeoff for performance since I want this thing to be fast???
As a first-time reader of both implementations, I find the first one
much easier to understand and maintain than the second one. Adding
levels of abstractions without a clear benefit only obfuscates code.
OK, then I'll go for the first. Especially considering it's faster...
: I'm using the simple "grade school" multiply algorithm. Note how the
: second routine more easily outlines this algorithm, while in the first
: it is a little more difficult to see. Which would you prefer, exactly?
The first one.
Hmm.
: Also, could I lose the typecasts in this code or do I need them?
I'll comment & review the first function below.
At some point, you do need to (at least implicitly) cast
an operand to the larger type, as this might not happen
automatically for some combinations of types, in C++.
So then in this case a cast is OK, right?
: First version (fastest):
: ---
: void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
: {
: /* Check to see if we're doing an in-place multiply */
: if(&a == this)
: {
: MulInPlace(b);
: return;
: } else if(&b == this) {
: MulInPlace(a);
: return;
: }
General class design: unless there is an established reason
not to do so, I would use operator overloading and implement
(as members of the Num class):
static Num operator *( const Num& a, const Num& b );
Num& operator *( const Num& a );
Maybe RawDigit::Mul is an internal private member, and the
above operations are provided? But then the special cases
(e.g. multiply in place) would best be handled in the
layers above...
I've tried overlaoded operators but one seems to have to construct
a temporary copy to hold results to return in the case of the binary
operators. This is a big problem for my app. as I have performance
concerns. Deallocating and reallocating memory billions of times
is a serious waste of performance. (These routines will be called
billions of times.)
Therefore I do the opposite and build these low-level routines first,
then build the overloaded operators using them, which can then be
used in all non-performance-critical areas of the program. If it turns
out that I don't need the MulInPlace() functionality to be integrated
with Mul(), I'll just remove it and document that Mul() does not
work in-place. Is that acceptable practice?
: /* Get lengths. */
: std::size_t rLength(GetLength());
: std::size_t aLength(a.GetLength());
: std::size_t bLength(b.GetLength());
:
: /* Zeroize this. */
: Zeroize();
Is the intent truly to implement modulo math, and to allow
the result to be truncated? If not the result should be
resized automatically, and the code is simplified.
So I can omit this functionality and just document that the
routine doesn't behave as most would expect? (If I have operands
of 3 differing lengths I'd usually expect a Mul() to respect
those lengths but then again your opinion may differ.)
Also the above just does nothing but clear the result buffer,
which you need as we're going to add (as in arithmetical addition)
numbers to it. You need that for the multiply algorithm to work.
And you need to get the lengths of the input operands because
you're going to loop over the digits, no? And you don't want to
read outside the buffer or read too few digits? I think you would,
but then again maybe I'm wrong...
: /* Do the multiplication. */
: TwoDigit64 carry(0);
Technically, the carry should only require 1 digit,
and can be defined within the outer loop below.
But I thought then that would create a performance-wasting conversion
to/from 32/64 again and again.
: for(std::size_t i(0);i<aLength && i<rLength;i++)
: {
: carry = 0;
:
: for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)
: {
: TwoDigit64
: tmp((static_cast<TwoDigit64>(a.digits[i])*b.digits[j]) +
: digits[i+j] + carry);
: carry = tmp >> DIGIT_BITS;
: tmp -= carry << DIGIT_BITS;
: digits[i+j] = tmp;
: }
:
: if(i+bLength < rLength)
: digits[i+bLength] = carry;
: }
: }
Let's say that you have the following digit-related
definitions within your class:
typedef ... Digit;
typedef ... TwoDigit;
const unsigned digitBitCount = ...;
const Digit digitBitMask = 0xF...UL;
Assuming no truncation (because of adequate pre-allocation
of result digits), the same algorithm can be written as:
for( unsigned aPos = 0 ; aPos<aLength ; ++aPos )
{
Digit carry = 0;
TwoDigit const aDig = a.digits[aPos]; //NB: cast "hidden" here
for( unsigned bPos = 0 ; bPos<bLength ; ++bPos )
{
TwoDigit mul = aDig * b.digits[bPos]
+ this->digits[aPos+bPos]
+ carry;
this->digits[aPos+bPos] = mul & digitBitMask;
carry = ( mul >> dig=
itBitCount );
}
this->digits[aPos+bPos] = carry;
}
There are many correct ways to write this, and some are
probably better is some ways than this example. But I
hope that you will find it useful.
In any case, given the very acceptable complexity of this loop,
I would not bother breaking it up or adding layers of
encapsulation.
OK, then.