Re: Why std::sort is ~20% faster then introsort coded by hand?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sun, 14 Sep 2008 09:11:57 -0700 (PDT)

Message-ID:

<e725962d-3263-4026-b1d7-ff449be604dc@d45g2000hsc.googlegroups.com>

On Sep 14, 4:25 pm, Jerry Coffin <jcof...@taeus.com> wrote:

In article <aae2f3e6-16c1-4e60-a214-
90a9cc30a...@c58g2000hsc.googlegroups.com>,
Karev.Alexan...@gmail.com says...

I'm comparing sorting algorithm for study goals. I've
compared STL std::sort and hand-coded introsort on millions
(tens of millions) of integers array sorting. It was tested
on random, sorted, and sorted in reverse arrays. In all
tests on such a huge arrays std::sort is faster by ~1/4 then
my version. So it seems that it some call optimization.

While it's impossible to say for _sure_ without even knowing
which exact library implementation you're looking at, the
obvious guess would be that the library is using a "modified
introsort", which is essentially the same as a modified
quicksort, but (obviously enough) with the extra book-keeping
and switching of an introsort added.

The idea of a modified quicksort is pretty simple: below some
number of items, the "simple" sorts (insertion sort, selection
sort, bubble sort) are generally faster than a quicksort.
Therefore, when you reach that threshold, you're better off
switching to something else (and the threshold is rarely very
critical).

A further optimization is due to Robert Sedgewick: he noted
that you still have a fair amount of overhead repeatedly
calling the simple sort on all those small partitions. He also
noted that with an insertion sort (unlike anything else) the
speed doesn't really depend on the number of items being
sorted, but strictly upon the distances they need to be moved
during sorting -- therefore, rather than calling it once every
time a partition reaches a certain size threshold, you just
_stop_ doing more sorting when the size reaches the threshold,
and then when all the partitions have been reduced to that
size, you call an insertion sort _once_ for the entire
collection. Even though you're using an insertion sort on a
large colletion, its speed is proportional to the sizes of the
partitions (when you stopped partitioning) rather than to the
size of the entire collection.

The difference this makes will depend _heavily_ upon the speed
of function calls. For example, it makes a MUCH larger
difference on an x86 than it does on a SPARC. Based on the 20%
difference you noted, it's a fair guess that you were probably
testing on an x86 or something very similar. More importantly,
that overhead also has an effect on the point at which you
should stop partitioning -- there's no one size that's right
for all architectures (though, as I mentioned above, the
threshold isn't usually critical).

Two other possibilities to consider:

-- Except for random data, quicksort is very sensitive to how
    the pivot is chosen. If he's constantly seeing roughly the
    same difference (including with already sorted arrays), then
    this probably isn't the case, but it can certainly explain
    significant differences in some specific cases.

-- Using the modified quicksort, it's possible to gain a little
    time once you go to do the final insertion sort by first
    searching for the smallest value (which can only be in the
    first partition, so you don't have to search far), and
    putting it in place first. Having done that, you can drop
    the test for having reached the beginning of the array in
    the loop of the insertion sort, since you're guaranteed to
    find a value which precedes the one you're inserting before
    reaching it.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34