In article <aae2f3e6-16c1-4e60-a214-
90a9cc30a...@c58g2000hsc.googlegroups.com>,
Karev.Alexan...@gmail.com says...
I'm comparing sorting algorithm for study goals. I've
compared STL std::sort and hand-coded introsort on millions
(tens of millions) of integers array sorting. It was tested
on random, sorted, and sorted in reverse arrays. In all
tests on such a huge arrays std::sort is faster by ~1/4 then
my version. So it seems that it some call optimization.
While it's impossible to say for _sure_ without even knowing
which exact library implementation you're looking at, the
obvious guess would be that the library is using a "modified
introsort", which is essentially the same as a modified
quicksort, but (obviously enough) with the extra book-keeping
and switching of an introsort added.
The idea of a modified quicksort is pretty simple: below some
number of items, the "simple" sorts (insertion sort, selection
sort, bubble sort) are generally faster than a quicksort.
Therefore, when you reach that threshold, you're better off
switching to something else (and the threshold is rarely very
critical).
A further optimization is due to Robert Sedgewick: he noted
that you still have a fair amount of overhead repeatedly
calling the simple sort on all those small partitions. He also
noted that with an insertion sort (unlike anything else) the
speed doesn't really depend on the number of items being
sorted, but strictly upon the distances they need to be moved
during sorting -- therefore, rather than calling it once every
time a partition reaches a certain size threshold, you just
_stop_ doing more sorting when the size reaches the threshold,
and then when all the partitions have been reduced to that
size, you call an insertion sort _once_ for the entire
collection. Even though you're using an insertion sort on a
large colletion, its speed is proportional to the sizes of the
partitions (when you stopped partitioning) rather than to the
size of the entire collection.
The difference this makes will depend _heavily_ upon the speed
of function calls. For example, it makes a MUCH larger
difference on an x86 than it does on a SPARC. Based on the 20%
difference you noted, it's a fair guess that you were probably
testing on an x86 or something very similar. More importantly,
that overhead also has an effect on the point at which you
should stop partitioning -- there's no one size that's right
for all architectures (though, as I mentioned above, the
threshold isn't usually critical).
the pivot is chosen. If he's constantly seeing roughly the
significant differences in some specific cases.
putting it in place first. Having done that, you can drop
reaching it.