Re: pointer arithmetic and std::vector

From:

"Alf P. Steinbach" <alfps@start.no>

Newsgroups:

comp.lang.c++

Date:

Fri, 03 Jul 2009 09:10:04 +0200

Message-ID:

<h2kb6u$vpn$1@news.eternal-september.org>

* aaragon:

* Ian Collins:

* aaragon:

Hello all,
I have the following problem. Let's say you have three std::vector
objects and you need to access the contiguous memory as if they were
part of a regular array. That is, I would like to access each vector
using pointer arithmetic such that the beginning of each vector is
obtained by an offset or displacement from the first vector.

The address of the first element (as you wrote) will work, provided the
vector isn't empty.

The only real use I can think of for doing this is to pass the address
to a C interface. For C++, use iterators.

Indeed I am using a C interface. I need to send objects through MPI,
and I don't want to copy the vector into a temporary buffer. One can
create an MPI data type that points to non-contiguous data contained
in an array. My objective is to trick that function and use the
pointer difference as if the non-contiguous data (two vectors) were
part of the same array. So you think this works always?

For general usage you need to be clear about what you can assume.

First, a std::vector's storage is guaranteed to be contiguous. This guarantee
was accidentally omitted in C++98, but that omission was rectified in C++03.
C++03 was a set of corrections to the original C++98 standard (C++03 was
"Technical Corrigendum 1", shortened TC1, and happily it's the only one).

There's also an in-practice such guarantee for std::string, but that guarantee
is not present in C++98 (even with C++03 corrections applied). It's an
in-practice guarantee because all extant compilers let std::string have a
contiguous buffer, and because C++0x will have this guarantee.

However, when you obtain a raw pointer or reference to a buffer you need to be
clear about lifetime issues.

If the buffer is reallocated then your pointer or reference becomes invalid. And
if the buffer is swapped with another vector's buffer then you're maneuvering in
dangerous straits.

Happily std::vector buffer reallocation only happens (that is, it *may* happen,
but need not happen) when the vector's capacity is increased.

For example, if size() == capacity(), then a push_back will increase the
capacity by reallocating the buffer, and your pointer/reference becomes invalid.

Steering clear of such pitfalls your offset based code should work in practice
on all modern systems, but it's not supported by the standard's guarantees. For
example, on an old MS-DOS or 16-bit Windows system you might have one vector's
buffer in one segment and another vector's buffer in another segment. And then,
except for the i86 "large" model of memory handling, your code would fail.

And for that reason, and for the reason of not being "too smart" (generally
ungood), it's probably better to put your data in a single vector than to put it
in three separate vectors. And in that connection, it's really very unclear what
purpose the offset arithmetic has. Why do you convert pointers to offsets and
then back to pointers, at all?

Cheers & hth.,

- Alf