Re: is there a fast way to get every bit in a uint64_t
greghe@mac.com (Greg Herlihy) wrote (abridged):
Otherwise, counting the bits via a table look-up would almost
certainly be faster than testing each bit individually.
[...]
int count = 0;
count += kNumBits[n & 0xFF];
count += kNumBits[(n >>= 8) & 0xFF];
count += kNumBits[(n >>= 8) & 0xFF];
count += kNumBits[(n >>= 8) & 0xFF];
count += kNumBits[(n >>= 8) & 0xFF];
count += kNumBits[(n >>= 8) & 0xFF];
count += kNumBits[(n >>= 8) & 0xFF];
count += kNumBits[n >> 8];
return count;
It might be even quicker if you avoid side effects:
return
kNumBits[n & 0xFF] +
kNumBits[(n >> 8) & 0xFF] +
kNumBits[(n >> 16) & 0xFF] +
kNumBits[(n >> 24) & 0xFF] +
kNumBits[(n >> 32) & 0xFF] +
kNumBits[(n >> 40) & 0xFF] +
kNumBits[(n >> 48) & 0xFF] +
kNumBits[(n >> 56) & 0xFF];
That way the hardware is more likely to calculate the subexpressions in
parallel, or to reorder it, or mess around with it any way it likes. It
may be able to calculate the shifts while it's waiting for the table to
load into cache, for example.
Possibly faster and even less portable is to extract the bytes with a
cast:
const unsigned char *bytes = reinterpret_cast<unsigned char *>(&n);
return
kNumBits[ bytes[0] ] +
kNumBits[ bytes[1] ] +
kNumBits[ bytes[2] ] +
kNumBits[ bytes[3] ] +
kNumBits[ bytes[4] ] +
kNumBits[ bytes[5] ] +
kNumBits[ bytes[6] ] +
kNumBits[ bytes[7] ];
I've not timed (or even compiled) either of these, but I think they are
worth trying.
-- Dave Harris, Nottingham, UK.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]