Re: object on stack/heap performance problems
On 2007-07-01 14:51, orobalage@gmail.com wrote:
Hi!
I was developing some number-crunching algorithms for my university,
and I put the processor into a class.
While testing, I found a quite *severe performance problem* when the
object was created on the stack.
I uploaded a test archive here: http://digitus.itk.ppke.hu/~oroba/stack_test.zip
Inside you'll find the number cruncher class (CNN in cnn.h and
cnn.cpp), as well as two test files: test_slow.cpp and test_fast.cpp.
They differ ONLY in where the processor object is created. In one, it
is created on the stack, in the other, it is created on the heap. Yet,
when I call the member function process(), the performance difference
is 5x!!!
Can someone with a higher knowledge of object layout and whatsoever,
tell me why this is happening?
Results when I compile/run your code with Visual C++ Codename Orcas
Express Beta1 (Visual C++ 2008)
Debug:
heap: 12868
stack: 13118
Release:
heap: 38666
stack: 4383
That's a difference of about 8.8 times faster when using the stack. I
have not used any profilers or such but there are some stuff in your
code that I find highly dubious, especially the allocation for the
RowMatrix. From what I can understand of the code you do some "magic" to
make sure the code is aligned properly, but does it work? Are you sure
your computer (or the it will run on) really works best with 32 byte
boundaries? This also makes your code totally unportable, I had to change
data = (float*) ((((long)(real_data))+31L) & (-32L));
to
data = (float*) ((((long long)(real_data))+31L) & (-32L));
before my compiler would let it through, and I'm still not sure what you
are trying to achieve with it.
Another thing that strikes me is that you use malloc, and while I'm no
expert I think this will cause your program to use two heaps, one for
new'ed memory and one for malloc'ed, this might slow things down.
I'm not sure what your number-crunching algorithm is supposed to do, so
I can't give you any better advice than to try to make the RowMatrix
simpler and try again.
--
Erik Wikstr?m