Re: vector::insert performance tip.
On Apr 30, 2:30 am, "Stuart Golodetz"
<stuart.golod...@NnOeSwP.AoMx.ac.uk> wrote:
"James Kanze" <james.ka...@gmail.com> wrote in message
news:1177871177.502002.135000@n59g2000hsh.googlegroups.com...
On Apr 29, 1:28 pm, kostas <skola...@gmail.com> wrote:
[...]
Both the vector expansion and the set traversal are amortized linear
operations in the number of items.
Which means very little on a modern machine. The time it takes
to execute a single operation can vary in significant
proportions depending on whether the required variables are in
registers, in cache, in main memory, or must be paged in. On
the machines I use, something like iter++, where iter is an
std::set<>::iterator, can vary between less than a microsecond
(everything in registers) to somewhere around 10 milliseconds
(if I get a page miss)---4 orders of magnitude.
[...]
I'm not sure if I'm missing the point here, but for the purposes of
complexity analysis, does it actually matter how long each individual
instruction is taking?
You're missing the point that what the original poster measured
was execution time. My point is, precisely, that complexity has
little relationship to execution time. It's a useful indicator
as to whether something will scale, but even then, for example,
an O(n) algorithm will suddenly become much, much slower if page
faults occur, and for very large data sets, locality can become
as important a consideration as complexity.
If we've got a constant (i.e. independent of the number of
items) upper bound for the time taken for an instruction
(however large), then an operation which requires a number of
instructions that is linear in terms of the problem size will
run in linear time.
No. That's only true if the time taken for an instruction is
independant of the size of the data set. Paging and caching
introduce a non-linear component into the execution time of
instructions, which can depend on the size of the data set.
For example, if
I've got an operation on n items that requires no more than (say) 2n
instructions, with each instruction taking at most k milliseconds, then t=
he
maximum time taken on a problem of size n is 2nk milliseconds, which (for=
k
not a function of n) is linear in n. The actual value of k isn't relevant=
to
the complexity analysis, even if it's relevant to the person implementing
the code.
Once k has any dependence on n, your complexity analysis no
longer makes reasonable predictions concerning the size.
Saying, for example, that ++i takes a maximum of 10
milliseconds, and calculating bounds from that, doesn't mean
much if most of the time, it only takes 1 or 2 microseconds, and
if it will only take more than 5 milliseconds if the number of
elements depasses, say, 10 million. You still do get assymtotic
performance with enough elements, but the number of elements
needed to get there is so high that it won't occur in actual
practice. For performance reasons, if nothing else; once you
start getting page faults, your performance will end up being
unacceptable.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34