Re: Performance of exported template classes

From:

"Alexander Grigoriev" <alegr@earthlink.net>

Newsgroups:

microsoft.public.vc.language

Date:

Thu, 11 Mar 2010 19:50:21 -0800

Message-ID:

<#eRwkcZwKHA.5036@TK2MSFTNGP02.phx.gbl>

Normally, the STL-generated code can get heavily optimized and inlined. But
if you export the code, the no-inline functions will be used.

<gast128@hotmail.com> wrote in message
news:09ae418f-3610-4ef5-8df2-d41d7e45eed5@g19g2000yqe.googlegroups.com...

Hello all,

this may be a difficult to explain problem, and I need some assembly
to show the difference. In a DLL we export some STL containers to
minimize code bloat, like:

template class __declspec(dllexport) std::vector<int>;
typedef std::vector<int> int_vector;

In a simple test probgram I see now a huge difference in performance.
The c++ function is as follows (same as std::fill, but this is just
example):

void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
  for (size_t n = 0; n != nLoop; ++n)
  {
     const int_vector::iterator itEnd = pVector->end();

     for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
     {
        *it = nValue;
     }
  }
}

In the assembly code somehow exception handling has been put in, and
this gets updated in the loop, which is major performance issue (see
'//! <- difference'):

void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
00401D30 push 0FFFFFFFFh
00401D32 push offset __ehhandler$?PrfMemoryIterator@@YAXPAV?
$vector@HV?$allocator@H@std@@@std@@HI@Z (403718h)
00401D37 mov eax,dword ptr fs:[00000000h]
00401D3D push eax
00401D3E mov dword ptr fs:[0],esp
00401D45 sub esp,4Ch
00401D48 mov eax,dword ptr [___security_cookie (406270h)]
00401D4D xor eax,esp
00401D4F push edi
00401D50 mov edi,ecx

<snip>

     for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
00401D7D lea ecx,[esp+4]
00401D81 push ecx
00401D82 mov ecx,ebx
00401D84 call dword ptr
[__imp_std::vector<int,std::allocator<int> >::begin (404004h)]
00401D8A mov eax,dword ptr [esp+4]
00401D8E cmp eax,dword ptr [esp+8]
00401D92 je PrfMemoryIterator+79h (401DA9h)
     {
        *it = nValue;
00401D94 mov dword ptr [eax],esi
00401D96 mov eax,dword ptr [esp+4] //! <- difference
00401D9A mov ecx,dword ptr [esp+8] //! <- difference
00401D9E add eax,4
00401DA1 cmp eax,ecx
00401DA3 mov dword ptr [esp+4],eax //! <- difference
00401DA7 jne PrfMemoryIterator+64h (401D94h)

However if we not export the STL containers, the generated code is
different:

void PrfMemoryIterator(int_vector* pVector, int nValue, size_t nLoop)
{
00401F60 sub esp,44h
00401F63 mov eax,dword ptr [___security_cookie (406290h)]
00401F68 xor eax,esp
00401F6A push edi
00401F6B mov edi,ecx

<snip>

     for (int_vector::iterator it = pVector->begin(); it != itEnd; +
+it)
00401F86 mov eax,dword ptr [ebx+4]
00401F89 cmp eax,ecx
00401F8B je PrfMemoryIterator+39h (401F99h)
00401F8D lea ecx,[ecx]
     {
        *it = nValue;
00401F90 mov dword ptr [eax],esi
00401F92 add eax,4
00401F95 cmp eax,ecx
00401F97 jne PrfMemoryIterator+30h (401F90h)

I use vstudio 2003 here, but I noticed something similar with the
_SECURE_SCL option in vstudio 2008, which also makes a difference from
a performance perspective .

Can anyone help? It is probably somewhere in the exception handling
corner, however why would this make a difference when using exported
classes or not?

Thx in advance.