Re: Problems with performance
On Tuesday, 26 February 2013 12:00:40 UTC, Alain Ketterlin wrote:
Seungbeom Kim <musiphil@bawi.org> writes:
On 2013-02-25 09:39, =D6=F6 Tiib wrote:
[...]
for(j = 1; j < Ny; j++)
{
for(k = 1; k < Nz; k++)
{
curl_h = cay[j]*(Hz[i][j][k] - Hz[i][j-1][k])
- caz[k]*(Hy[i][j][k] - Hy[i][j][k-1]);
// ...
// rest of the stuff
}
}
Now you replace it ... (assuming the arrays contain doubles) with that=
:
[...]
for(j = 1; j < Ny; j++)
{
double cay_j = cay[j];
double (&Hz_i_j)[Nz] = Hz[i][j];
double (&Hz_i_j_1)[Nz] = Hz[i][j-1];
double (&Hy_i_j)[Nz] = Hy[i][j];
for(k = 1; k < Nz; k++)
{
curl_h = cay_j*(Hz_i_j[k] - Hz_i_j_1[k])
- caz[k]*(Hy_i_j[k] - Hy_i_j[k-1]);
// ...
// rest of the stuff
}
}
I'm just speculating, but reflecting that references just
create aliases, I doubt that using references like this will
affect the generated code in any significant way.
It will, and a lot.
If it does, then there's something seriously wrong with the
quality of the compiler. Hoisting loop invariants was
a standard optmization technique twenty or thirty years ago.
In the fragment above you have more array reference
that actual computations (like + - etc.). Consider a loop like:
for ( i=... )
for ( j=... )
... T.at(i,j) ...
(I use at(i,j) to abstract away the effective access). Now assume T is a
linearized array. This access is equivalent to:
T.data[i*N+j]
Calculating i*N on every iteration of the j loop costs
something. What =D6=F4 suggests is moving this computation out of
the j loop.
Except that pretty much every compiler in the world will do this
for you. And typically, when the compiler does it, it can do it
better than if you try to do it by hand, since it knows the
finality of the references and pointers it creates.
Note that if the array is stored as a 1D array of pointers to 1D arrays
of doubles, the same argument applies. T[i][j] is:
*(*(T.rows+i)+j)
again, *(T.rows+i) could be hoisted out of the j loop (if that location
is not overwritten during the run of the loop -- unfortunately it is
difficult for a compiler to realize this, and that's where restrict is
useful).
There is a large amount of computation going on during array accesses,
and factoring as much as possible to put it out of the loop is one of
the major optimization opportunity.
(To the OP: the library you use just makes array accesses much more
costly than they should be, and probably prevents this kind of
optimization.)
(Creating a reference will not even trigger a prefetch, will it?)
No.
And if it actually does, the compiler must not have been doing a very
good job even at merely grasping common subexpressions.
Common subexpressions are important, loop-invariant code motion is even
more, because it reduces the amount of code by a factor equal to the
number of iterations of the loop.
On the other hand, value copying instead of just aliasing (as for cay_j
above) may have a better chance of improvement.
This is actually no different from extracting a partial array access:
same idea, same potential gain.
Extracting the value has an important benefit; when the compiler
must access the values, it has to take into account possible
aliasing. Thus, in `idx[i][j][k] = ...` and `D[i][j][k] = ...`,
the compiler will probably have to assume that anything
expression that references into any of the other arrays might
have different results.
This depends on the definitions of the other arrays. If e.g.
`Hz` and `idx` are both C style arrays—_not_ pointers
which point to the first element, but the actual data
definitions—then the compiler can know that they don't
alias one another. Otherwise, it has to assume the worst, and
reread the elements each time through the loop. Manually
hoisting the reads of `gj3[j]` etc. out of the inner most loop
might make a significant difference, because the compiler
probably cannot do this (since if all it has are pointers, it
has to assume that one of the other assignments in the innermost
loop might modify this value). Depending on the dimensions, it
might actually be worth copying `Hz[i][j]` et al. into one
dimensional local arrays (which the compiler can see arn't being
modified).
Even better, of course, would be if the compiler has extensions
which allow you to tell it that there isn't any aliasing. C++11
didn't adopt C's `restrict` keyword, but this is one place where
it could help enormously (and some C++ compilers might support
it as an extension).
--
James