Re: Generic compare function

From:

Seungbeom Kim <musiphil@bawi.org>

Newsgroups:

comp.lang.c++.moderated

Date:

Mon, 18 Jan 2010 08:43:19 CST

Message-ID:

<hj0d96$dbc$1@usenet.stanford.edu>

Anders Dalvander wrote:

std::map::find is usually built on-top of std::map::lower_bound. To
give a correct result std::map::find need one extra call to "less"
than std::map::lower_bound. std::map::lower_bound, which only need to
utilize "less", wouldn't gain anything from calling "compare" instead
of "less", as "less" is the only thing needed and is only called once
for every level in the map.

Given a map of 1,000,000 elements, that map would have the height of
20. std::map::lower_bound would need to call "less" 20 times, and
std::map::find would need to call "less" 21 times.

For the same sized compmap, utilizing a "compare" function,
compmap::lower_bound would need to call "less" 20 times, and
compmap::find would need to call "less" 20 times. The complexity of
compmap::find would on the other hand be greater as it cannot fully
rely on compmap::lower_bound.

std::map::find would call "less" at most 5% more times than
compmap::find would call "compare", if the map would contain 1,000,000
elements.

It's not only the number of calls to "less" on the top level that
matters. Consider std::map<T> and compmap<T>, where T is std::pair<A, B>.
Each call to std::less<T> eventually leads to something like this:

     // taken from GNU libstdc++
     bool operator<(const pair<A, B>& x, const pair<A, B>& y)
     {
         return x.first < y.first
                || (!(y.first < x.first) && x.second < y.second);
     }

Therefore, if x.first < y.first evaluates to false, whose probability
would be 50% under a uniform distribution assumption, the above function
should also evaluate y.first < x.first just to check whether the two
operands are equal and the second fields should be considered. We could
have checked it in the first place when we compared x.first and y.first.
This is what I call inefficiency.

On the other hand, if we used "compare", each call to compare<T> would
eventually lead to something like this:

     int compare(const pair<A, B>& x, const pair<A, B>& y)
     {
         if (int r = compare(x.first, y.first)) return r;
         return compare(x.second, y.second);
     }

and we would never have to compare x.first and y.first twice.

I used std::pair just as an easy example, but the same phenomenon can
happen in any structure where the lexicographical ordering ("compare
field A first, and field B second, ...") is needed.

     bool operator<(const pair<A, B, C>& x, const pair<A, B, C>& y)
     {
         return x.first < y.first
                || (!(y.first < x.first)
                    && (x.second < y.second
                        || (!(y.second < x.second)
                            && x.third < y.third)));
     }

     // I'm not even sure if I got that right...
     // This would be easier to write and understand:

     bool operator<(const pair<A, B, C>& x, const pair<A, B, C>& y)
     {
         if (x.first < y.first) return true;
         if (y.first < x.first) return false;
         if (x.second < y.second) return true;
         if (y.second < x.second) return false;
         return x.third < y.third;
     }

     // This is intuitive and efficient!
     int compare(const pair<A, B, C>& x, const pair<A, B, C>& y)
     {
         if (int r = compare(x.first, y.first)) return r;
         if (int r = compare(x.second, y.second)) return r;
         return compare(x.third, y.third);
     }

And the inefficiency could get higher with a deeper level of nesting.

--
Seungbeom Kim

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]