Re: Contiguous allocation of multi-dimensional vector

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sat, 11 Aug 2007 05:05:54 -0700

Message-ID:

<1186833954.964509.285520@k79g2000hse.googlegroups.com>

On Aug 9, 5:49 pm, nw <n...@soton.ac.uk> wrote:

We've been having a discussion at work and I'm wondering if anyone
here would care to offer an opinion or alternative solution. Aparently
in the C programming HPC community it is common to allocate
multidimentional arrays like so:

int *v_base = (int *) malloc(1000000*sizeof(int));
int **v = (int **) malloc(1000*sizeof(int *));

for(int i=0;i<1000;i++) {
v[i] = v_base+(i*1000);
}

The logic behind this is that the entire multi-dimensional array will
be contiguous in memory and therefore it is more likely that the whole
array will be loaded in to cache, improving access time.

The following workaround I believe will simulate similar behaviour
using the STL:

vector<int> v_base(1000000);
vector<int *> v;

for(int i=0;i<1000;i++) {
v.push_back(&v_base[0]+(i*1000));
}

However will obviously break if you use insert or remove anywhere, and
doesn't seem like a great idea.

You mean that it makes it impossible for v[0] to have a
different number of elements than v[1]. Depending on the
application, that could be an advantage, rather than a
disadvantage.

The standard way of allocating this two dimensional array as I
understand it is:

vector<vector<int> > v(1000,vector<int>(1000));

It depends on the use. The *standard* way of handling
multidimensional arrays in C++ is to write your own class to do
so, with an overloaded operator[] which returns a "proxy" on
which operator[] is also defined. (Note that in this case, int*
would be an adequate proxy.) Then, you can do pretty much
whatever works in the implementation; I'd probably just use a
one dimension vector, and calculate the indexes.

So I guess my question is this: Is there any advantage to allocating
the vector contiguously

There certainly could be, for some applications. To begin with,
you'll use less total memory---if the inner dimension is 1000,
it's probably not distinctive, but for something like:
std::vector< std::vector< int > > v( 1000000,
std::vector< int >( 5 ) ) ;
the difference could be enormous.

or should the normal STL allocation method
place the vectors contiguously in memory? Is there likely to be much
vector overhead placed between the 1d vector<int>s such as the size of
the array?

Who knows? It could easily vary, even from one run to the next.

My gut feeling is that the normal STL way will produce similar
results, and the tests I've done seem to back this up, but I'd be
interested in hearing other peoples opinions.

The real question is whether whatever method you feel most
comfortable with is fast enough. If so, don't bother looking
any further. Personally, for a mathematical matrix, I prefer
the single dimension array, calculating the index. It makes it
impossible to violate the invariant that all of the sub-arrays
have the same size. But if that caused performance problems
(e.g. because multiplication was very slow on my machine), I'd
try something else.

Just make sure that whatever you actually do is behind the
firewall of a class definition, so you can change it at will
without affecting client code.

--
James Kanze (GABI Software) email:james.ka...@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34