Re: Boost.bind and performance penalties?
On May 26, 12:53 pm, Edek <edek.pieITAKNIECZY...@gmail.com> wrote:
Which reminds me: I do not understand why replacing vector offset by
vector of pointers should speed things up. Vector offset boils down to
pointer + offset.
Not quite:
1) ElementPointer = BasePointer + ElementOffset
2) Element = derefernce(ElementPointer)
while the pointer semantics is only half that:
A) Element = dereference(ElementPointer)
Not quite, if I understand your first post:
Don't take that post too literally; what *really*
goes on is something like (in vector semantics)
struct element_t {size_t a; size_t b; size_t c;};
std::vector<element_t> v;
size_t n,m;
m = v[v[v[n].a].b].c;
where one can not predict any relation between n and v[n].x.
I think this can be more efficient if written along the
lines of something like
struct Element_t {*Element_t A; *Element_t B; size_t C};
std::vector<element_t> v;
size_t n,m;
m = v[n].A->B->C;
....
In this application the difference matters. The issue is a bit
more involved than what I posted first (there might b 3 - 5 levels
of indirections through pointers to pointers). Maybe rewriting to
pointer semantics instead of vector semantics saves me 30% run time;
maybe it saves 60%. I don't know until I rewrite.
I don't know. I usually think in such cases that the compiler will do
such things for you, if you do not prevent it: generated assembly does
not match line-by-line what is in .cpp file. It is a simple
transformation. The compiler stops further transformations when faced
with something it does not know for sure. Indirection might be one of
them, true; a 'blackbox' call is for sure another. I still see no
significant difference between returning an offset and returning a
pointer, at least in a loop.
Again, my first few posts were maybe too simplistic. The example
above
may still be too simplistic, but gives some impression of what is
actually going on.
Making everything a template, or just inline, makes no 'blackbox' calls;
fewer indirections should help too.
Can't do that. The nature of the problem to be solved.
But it's that order of magnitude we are talking about. I don't want
to blow those savings or more, on an expensive function indirection.
Probably boost::bind is what you need to avoid. std::for_each or any
similar template would be better I guess.
It's two separate problems:
1) Getting fast indirections
2) Applying functions without overhead.
Problem 1 is not really essential for problem 2, other than as
a demonstration of what kinds of overheads are damaging.
These are mostly rules of thumb. If you have a lot of time to spend =
on
programming, use expression templates, or inline at least. If you
cannot, use boost::bind and just don't worry.
Can't afford not to worry. Run-time is a critical factor here.
Still, measure first, make a simple test, measure again. There are mor=
e
things in the CPU than are dreamt of in programming.
The performance factors only kick in visibly at large volumes.
No such thing as a 'simple test': either rewrite the whole thing
or nothing at all.
Maybe. I use simple tests to learn the mechanisms, on large volumes. It
might be easier than rewriting the whole thing and learning later that
it was not the right way to go. Which by the way happens during such
changes, performance is too often heuristic-like. What is clearly an
advantage, is changes like O(n^2) to O(log n), but we're not talking
about that now.
The present version of the application is a 'naive simple semantics'
version, which has already been shown to be too slow. The question
now is how to implement a fast version that also might be flexible
enough to be useful in other applications, where speed is not as
much of an issue.
Rune