Re: Iterating a std::vector vs iterating a std::map?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Wed, 25 Nov 2009 01:46:28 -0800 (PST)

Message-ID:

<6eec4e18-648c-41e4-9483-ddb714114fae@o10g2000yqa.googlegroups.com>

On Nov 24, 4:37 pm, Juha Nieminen <nos...@thanks.invalid> wrote:

James Kanze wrote:

As a general rule, you shouldn't use standard classes, or
classes from any general library, as part of the application
abstractions. You should always define your own classes,
with the exact interface you need (and no more). The
standard classes only appear in the implementation of these
classes, so you can swap them in or out as needed (or even
replace them with a custom implementation, if none of the
standard classes meets your needs).

While in certain large projects it's a good idea to abstract
implementation details away as much as possible, in smaller
projects trying to abstract everything away can be more work
than it's worth.

"As a general rule" doesn't mean "absolutely with no
exceptions". In general, in all large projects and in most
small projects, it's worth defining your application specific
abstractions. If you're writing an editor, you don't use
std::string for your text buffer, you use TextBuffer, a class
that you've designed. (Of course, at least at the beginning,
TextBuffer will use std::string in its implementation.)

For example, suppose you have some function or member function which
takes, let's say, a std::vector as parameter:

class Something
{
public:
    void foo(const std::vector<unsigned>& values);
};

This is very unabstract code. It hard-codes both the data
container and the element type. One would think that it's a
good idea to abstract that away a bit, for example:

class Something
{
public:
    typedef std::vector<unsigned> ValueContainer;
    void foo(const ValueContainer& values);
};

Now if all the outside code uses Something::ValueContainer
instead of std::vector<unsigned>, everything is well? Maybe,
except that that typedef doesn't enforce anything. You can
still see that what's being used is really a
std::vector<unsigned>, and nothing stops you from using that
type directly and passing instances of it to Something::foo().

First, it depends. Is this ValueContainer fundamentally part of
your application abstraction? If so, it should be a class, and
not a typedef. A class which exposes the interface defined by
the abstraction in your application, and not the interface
defined by std::vector. If not, if the abstraction is precisely
that of std::vector, then std::vector is fine.

Having said that, it's sometimes a bit delicate. I have more
than a few cases where the abstraction does wrap a standard
container, like std::vector, *and* includes iterators. In such
cases, it's very tempting to use a (member) typedef to define
the iterators, which does lock me into random access iterators,
even if the abstraction doesn't require them. I've thought
about creating a template to "downgrade" iterators, but I've not
gotten around to it.

Moreover, even if the outside code would strictly adhere to
always using Something::ValueContainer, exactly which member
functions of that type is it allowed to use? Since it is a
std::vector, all of the std::vector functions are free to be
used, making it more difficult to change the type of
ValueContainer later to something else. You quickly notice
that, in fact, you didn't abstract anything away at all with
that typedef. You are just using an alias.

In sum: typedef doesn't define a new abstraction. What else is
new?

So if you want to truly abstract the type away you have to do
it like:

class Something
{
public:
class ValueContainer;
void foo(const ValueContainer& values);
};

and then you define Something::ValueContainer as a class which
contains the necessary member functions for passing the
numerical values to Something::foo().

The problem? Now Something::ValueContainer will be way more
limited than std::vector is. For example, the calling code
might benefit from things like random access with that data
container, so if ValueContainer doesn't provide it, it hinders
what the code can do.

That's not the problem. That's rather exactly what we're trying
to achieve. My code guarantees a certain functionality for
ValueContainer, and only that functionality. Client code can't
use more than I guarantee (in theory, anyway), so I'm free to
change the implementation anyway I want, as long as I maintain
that functionality.

It's called encapsulation, and it's essential if you want to be
able to optimize the code later. Or evolve it in any other way.

Of course this is intentional: By not providing random access
you are more free to change the internal data structure to
something else if needed (eg. std::list). However, by being
too careful like this, you are now hindering the outside code.

You want to define a contract for the outside code. Ideally,
you'll define the contract in such a way that the outside code
can do whatever it needs to do, and you maintain all the
necessary freedom to implement the code however you want. Even
using a typedef to std::vector "hinders" outside code: it can't
hold an iterator into the container over an insertion, for
example. It's up to you, as the designer, to decide what you
want (or need) to support, and what you don't.

What you could do is to add a way to initialize a
Something::ValueContainer with a set of values (from a
container or an iterator range). However, what you have done
now is effectively move the abstraction problem from Something
to Something::ValueContainer, achieving only little advantage.
You could as well have done that with Something::foo()
directly.

It might also introduce an inefficiency because now values
need to be copied around instead of the calling code just
using the same container for both whatever it needs to do (eg.
requiring random access) and making Something read from it, so
the values don't need to be copied anywhere just for
Something::foo() to read them.

I'm not sure I fully understand what you're getting at in the
above two paragraphs, but one thing is sure: *IF* performance is
ever an issue, you'd better hide all implementation details
behind such an abstraction, and limit client code as much as
possible, or you'll be overly constrained in you optimization
possibilities.

Don't forget, if you've not provided random access, and it later
becomes evident that client code needs it, you can always add
it. Adding functionality causes no problems. Removing it, on
the other hand, breaks client code. So you don't want to
provide anything you're not sure the client needs.

--
James Kanze