Re: Efficient search in std::bitset

From:

Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>

Newsgroups:

comp.lang.c++

Date:

Sat, 24 Jan 2015 00:42:48 -0500

Message-ID:

<54c330d9$0$25111$c3e8da3$92d0a893@news.astraweb.com>

jononanon@googlemail.com wrote:

On Friday, January 23, 2015 at 11:32:33 PM UTC+1, jono...@googlemail.com wrote:

On Friday, January 23, 2015 at 11:24:28 PM UTC+1, jono...@googlemail.com wrote:

Hi there,

how would you implement an efficient search for the first bit conforming to a particular bit_state (true or false) in a std::bitset<N> ?

Here's my desired prototype:

template<size_t N>
size_t find(bitset<N> bs, size_t idx, bool state)

// Search for index i such that i >= idx and bs[i] == state

/* if no such index is found, return i = Bitset::npos

   namespace Bitset {
      size_t npos = numeric_limits<size_t>::max();
   };
*/

Here's my attempt:

//////////////////////////////////////
#include <bitset>
#include <stdexcept>
#include <limits>
#include <climits>

using namespace std;

namespace Bitset {
  size_t npos = numeric_limits<size_t>::max();
};

unsigned find_in_ulong(unsigned long lon, bool state)
/* only call this function if lon does definately have a bit that is set to state */
{
  constexpr unsigned char mask_byte_u = -1;

  const unsigned long fail_byte(state
                                ? 0U /* none set */
                                : mask_byte_u /* all set */);

  unsigned shift = 0;
  do {
    if ((lon & mask_byte_u) != fail_byte) {
      for (unsigned i = 0; i < CHAR_BIT; ++i) {
        if ((lon & 1) == state)
          return shift + i;
        lon >>= 1U;
      }
      // should never reach this line
    }
    lon >>= CHAR_BIT;
    shift += CHAR_BIT;
  } while (lon);
  throw runtime_error("state not found in find_in_ulong");
}

template<size_t N>
size_t find(bitset<N> bs, size_t idx, bool state)
{
  if (idx >= N)
    return Bitset::npos; // not found

  bs >>= idx;

  constexpr unsigned num_mask_bits = CHAR_BIT * sizeof(unsigned long);
  constexpr unsigned long mask_long_u = -1UL;

  const bitset<N> mask_long(mask_long_u);

  const unsigned long fail_long(state
                                ? 0U /* none set */
                                : mask_long_u /* all set */);
  while (idx < N) {
    unsigned long lon = (bs & mask_long).to_ulong();
    if (lon != fail_long)
      return idx + find_in_ulong(lon, state);
    bs >>= num_mask_bits;
    idx += num_mask_bits;
  }
  return Bitset::npos; // not found
}

//////////////////////////////////////

I reckon that the performance is fairly decent in terms of speed.

But I depend on a library implementation of std::bitset which has a clever and lightning-fast to_ulong() function. This is only possible if the design and implementation of std::bitset has a bool flag, which shows if high-bits outside of the unsigned long range are set (and the exception overflow_error is hence called); and if not shoots out that ulong as fast as possible. In other words: to_ulong() should not waste any processor cycles with checking bits!

A possible speed bottleneck, is the masking of the following: (bs & mask_long)

Why does std::bitset not have a way of chopping off unneeded higher bits, and just returning a u_long of the lower bits (without considering any higher bits that are set)? A function like: chop_ulong()

That would allow one to leverage more performance, wouldn't it ?

The performance of such an operation is too dependent on predominant usage and
data distribution to have a generic solution. To illustrated the point about
different data distributions (for simplicity, the below assumes the "particular
bit state" your are searching for is true and you will see that often the best
representation of the bit set is not std::bitset at all):

- the probability true and false is 0.5 at every position. The best algorithm is
probably the naive iteration (bitset representation should do)
- the first 'true' can appear at any position with equal probability; the
probability of true and false in any subsequent position is 0.5. You are
probably best off with never storing the useless beginning, as you hinted to
above. You could use some deque<bitset<sizeof(unsigned long) * CHAR_BIT> > and
store the initial position of 1 (which must be in the first bitset) with it.
- there are relatively very few "trues" and all you need is to iterate over
their indices in the bitset in order. You could even use std::set<size_t> that
stores indices of trues instead of the bitset. If you do not need to toggle bits
after creating bitset, you could also a sorted vector of indices of "true" in
this case.
- if your bitset fits in a single value of an integral type, but trues are
relatively rare there, there are a few binary-log algorithms you could use, some
of order of O(log2(sizeof(your-integral-type) * CHAR_BIT), see for example
http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious. Or, you
could use a single platform-dependent instruction, e.g. for Intel:
http://web.itu.edu.tr/kesgin/mul06/intel/instr/bsf.html

etc.

On a side note, bitset is rarely good for efficiency as you do not have access
to the underlying memory; usually you are better off with your own
representation tailored for your needs.

std::vector<bool>, even though it has non-compile-time size, has IMHO a better
theoretical potential for out-of-the box optimization in the Library as the
Library could (I think) specialize algorithm such as std::find for its
const_iterator type using platform-specific tricks in implementation including
those mentioned above and your requested find (reformulated for vector<bool>)
can be implemented in a straightforward manner over std::find. I am not sure if
any implementation provides so optimized std::find() though.

Hope this will help,
-Pavel