Re: vectors and user-defined objects

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Wed, 16 Jan 2008 00:48:47 -0800 (PST)

Message-ID:

<413d9473-beac-4f2c-b289-668a32a5a61b@i3g2000hsf.googlegroups.com>

On Jan 15, 1:18 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

In article
<32bbeb00-f6a6-4b9d-9c04-a1671ace8...@i12g2000prf.googlegroups.com>,
James Kanze <james.ka...@gmail.com> wrote:

On Jan 14, 6:19 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

In article
<ea77bb89-125d-4d44-b447-de6cae4a6...@j78g2000hsd.googlegroups.com>,

Jim <j...@astro.livjm.ac.uk> wrote:

On Jan 14, 1:56 pm, jkherci...@gmx.net wrote:

Jim wrote:

Just wondering which is better

vector<record *> r;
r.push_back(new record(x,y));

Although Kai-uwe talks about it in his point (b), I think this
needs to be highlighted as this is fundamentally wrong: the
above is a memory leak.

It depends.

In most cases where I've seen this, r would either have static
lifetime or be a singleton, never destructed. The constructor
of record would insert it into r, and the destructor removes it.
It's actually a very, very common idiom for entity objects.

Note that the op code was:

r.push_back(new record(x,y));

That's very different from what you are suggesting above with
a static vector and record inserting themselves in it for
reference.

Yes. I feel fairly sure that the OP didn't understand the
difference between entity objects and value objects, for
example. My response wasn't really geared to his problem, but
to the more general suggestions (i.e. a delete must be
associated with the destruction of the vector). In my
experience, such is rarely the case. Precisely because you do
use value semantics if the object lifetime is dependent on the
container lifetime.

The above code should never exist in isolation of its deleting
loop. If attempting to compare the two approaches, you must
include the deleting loop in the comparaison.

STL containers have value semantic and may copy their own
content freely around. STL containers do not know if they are
holding pointers or object and will never call "delete" on
their content.

And? That's usually exactly what is wanted. If the container
owns the objects, you'd usually use values (and the objects
cannot have identity, because the container copies them). If
the container doesn't own the objects, and the objects have
identity, then you need pointers. There are cases where you'd
want to delete the objects because the container is going out of
scope, but they are fairly rare.

My point is that the code above should not be considered in
isolation. It is only correct under a set of specific
circumstances of which you have mentionned some examples or if
somewhere in the code there's a loop that deletes every single
pointers in the vector before it goes out of scope. (or if
you are lazy and this is a short lived program, you could rely
on the OS freeing all the memory when the program exit but
that's not an approach I would recommend).

And my point is that when the above isn't correct, you should
probably be using a container of values. I think we're just
looking at it from different angles.

So the original question should be: "Which is better:"

vector<record *> r;
r.push_back(new record(x,y));
// ... do stuff
while(!r.empty())
{
delete(r.back());
r.pop_back();
}

Which is, IMHO, simply wrong. If, for some reason, you need a
collection of pointers to dynamically allocated objects whose
lifetime is associated with that of the container (something
which is very, very rare), then the *only* correct solution
involves using a container of boost::shared_ptr, or something
similar. (Note, however, that in this case, the automatic
reaction shouldn't be: "use boost::shared_ptr", but "use values,
not pointers".)

OR:

vector<record> r;
r.push_back(record(x,y));

// do stuff

These two code snippets are much more directly comparable
although we are still missing the definition of record and the
first snippet is not exception safe. Looking at them now, I
would say that even a C++ newbie would notice the additional
complexity required in the first case. If we add exception
safety to the first snippet, it becomes even more complicated.

OK. If I understand you correctly, you weren't suggesting the
first as a good solution, but simply showing why you should
favor values. I'll admit that I tend to consider the issue from
an even more fundamental level: if you have values (at the
conceptual level), then use values. And things that disappear
with the container are almost always values. Even without the
added complexity and the missing exception safety, using
pointers when you have values is conceptually wrong.

IMNSHO, that should be your default solution. I.e. unless
you know that this is not suitable for this particular
case, use this form.

Let me see if I've got that straight. You should use this
form except when you shouldn't use it:-). (I'm being a bit
facetious there---I basically agree with you.

Exactly. Use the default solution unless you know that you
shouldn't use it in this case. e.g.:

-Use standard containers with value semantic unless you know you must
use pointers.
-Don't (micro-)optimize unless you know you must.
-Use local variable unless you know you must use a global
-Use automatic object unless you know you must use free store.
-Make reference arguments const unless you know they must not be.
-Use standard containers unless you know you they don't meet your
requirements.
-Use vector by default unless you know you shouldn't in this case.
-Drive on the normal side of the road unless you know this is a
one-way street and you can safely drive in the other lane.

(I am sure someone will jump on me for these :-)

Not at all. I think it's a very good list, and very well
presented.

What I might add is that you should design first, and whether an
object is an entity or a value (or some other category) should
be determined by the design. In a certain sense, you don't need
the first rule, since the role of the container and the role of
what it contains has been determined by design. It's a good
rule, but it's a rule at the coding level, when the decision
should have been made upstream, at the design level.

I'm tempted to say the same thing for rules 3 and 4, except
there are "working variables" which aren't handled at the design
level, but are introduced during coding. For all others,
lifetime of the object is a design decision.

Don't get me wrong. All of these default solutions can be
broken. In fact, they very often must be broken. But if one
does, one should be able to explain why.

I think that learning the various "standard" categories of
objects should have precedence. Until you understand the
difference between value types and entity types, it doesn't make
sense to discuss the question.

That's also what I mean. Kai-uwe answer was very extensive
and quite correct but for learners, I quite like a practical
and simpler approach of considering a "standard" solution
first and deviate from it if you know it you must. Actually,
even for experienced developpers, I think it's good to first
consider the obvious default solution before going for a more
complex pattern.

I think the problem with regards to learners is that there is
too much to learn at once. Logically, you can't write good C++
(or good anything) until you know something of design. But it's
very difficult to teach C++ and design at the same time---you're
throwing too much at them at once. And teaching design before
they know a programming language would be, I suspect, very
frustrating---the student learns a lot of principles that he
can't apply until the next semester (by which time he'll have
forgotten them if he hasn't used them).

As for experts, I agree that having some rules of thumb is a
good idea, but I suspect that they won't often have the occasion
to apply some of the rules very often, since they are coding
rules, and the decision will have been made at design time,
before coding starts.

(I might add that there is one that I think needs some
elaboration: use vector by default. I would argue that you
should never use a standard class for a fundamental abstraction
of your application. You should design a class with a narrow
interface, which provides exactly what you need, and no more.
That way, you maintain maximum freedom to modify the
implementation. You should, however, use vector by default in
the implementation, just as you should use it by default when
you incedentally need a container, i.e. the container is not
part of the design.

And of course, that generally, you should treat strings as a
more or less primitive type, and not a container. If you want a
container of char, then it's vector<char> by default, but most
of the time, conceptually, you don't want a container of char,
but a string.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34