Re: object on stack/heap performance problems
 
On 2007-07-01 14:51, orobalage@gmail.com wrote:
Hi!
I was developing some number-crunching algorithms for my university,
and I put the processor into a class.
While testing, I found a quite *severe performance problem* when the
object was created on the stack.
I uploaded a test archive here: http://digitus.itk.ppke.hu/~oroba/stack_test.zip
Inside you'll find the number cruncher class (CNN in cnn.h and
cnn.cpp), as well as two test files: test_slow.cpp and test_fast.cpp.
They differ ONLY in where the processor object is created. In one, it
is created on the stack, in the other, it is created on the heap. Yet,
when I call the member function process(), the performance difference
is 5x!!!
Can someone with a higher knowledge of object layout and whatsoever,
tell me why this is happening?
Results when I compile/run your code with Visual C++ Codename Orcas 
Express Beta1 (Visual C++ 2008)
Debug:
   heap:   12868
   stack:  13118
Release:
   heap:   38666
   stack:  4383
That's a difference of about 8.8 times faster when using the stack. I 
have not used any profilers or such but there are some stuff in your 
code that I find highly dubious, especially the allocation for the 
RowMatrix. From what I can understand of the code you do some "magic" to 
make sure the code is aligned properly, but does it work? Are you sure 
your computer (or the it will run on) really works best with 32 byte 
boundaries? This also makes your code totally unportable, I had to change
   data      = (float*) ((((long)(real_data))+31L) & (-32L));
to
   data      = (float*) ((((long long)(real_data))+31L) & (-32L));
before my compiler would let it through, and I'm still not sure what you 
are trying to achieve with it.
Another thing that strikes me is that you use malloc, and while I'm no 
expert I think this will cause your program to use two heaps, one for 
new'ed memory and one for malloc'ed, this might slow things down.
I'm not sure what your number-crunching algorithm is supposed to do, so 
I can't give you any better advice than to try to make the RowMatrix 
simpler and try again.
-- 
Erik Wikstr?m