Re: Passing multi-dimensional arrays to functions

From:

Seungbeom Kim <musiphil@bawi.org>

Newsgroups:

comp.lang.c++.moderated

Date:

Thu, 28 Mar 2013 01:06:59 -0700 (PDT)

Message-ID:

<kivu06$6oh$1@usenet.stanford.edu>

On 2013-03-27 08:09, ?? Tiib wrote:

On Wednesday, 27 March 2013 13:10:02 UTC+2, Seungbeom Kim wrote:

On 2013-03-26 19:40, ?????? Tiib wrote:

That factor is not too important ... more idiomatic would be to
assume that typedefs are always done ...:

     typedef std::vector<double> Column;
     typedef std::vector<Column> Matrix;
     Matrix::size_type sz = matrix.size();

I often do that, too. And that turns a one-line declaration of an
array ('double matrix[M][N];') into three lines. Which is not too
bad, but while I like typedefs that yield meaningful abstractions,
this is rather just to replace repeating long names with repeating
short names, and/or typing too much on a single line with typing
reasonable amounts on more lines, which I feel to be like a
necessary evil.

The typedefs are part of class that has such multidimensional
relation. Multidimensional relation is complex and expensive.
Expensive things result with desire to refactor. For example to
consider switching Column with 'std::map<int,double>'. You have
difficulties to measure performance of such alternatives without
typedefs. Typedefs together with 'auto' make it often almost
effortless.

I agree with you in general, but not with your example. And I should
have made it more clear that my criticism was more toward the
intermediate typedef for Column than the one for the entire Matrix.

A matrix is neither necessarily a vector of columns nor necessarily a
vector of rows, and it may be thought of as both. If you decide to
represent a Matrix with a vector of Columns, that's an implementation
detail rather than a meaningful abstraction to be published, and you
need the typedef just to reduce some typing.

For example, if you were using built-in arrays, you wouldn't need the
intermediate typedef: instead of

     typedef double Column[N];
     typedef Column Matrix[M];

you would just write

     typedef double Matrix[M][N];

because even without having Column typedefed, there's no more
verbosity, and the typedef serves no other purpose.

For another example, you may want to switch your representation of
Matrix into a vector of Rows instead, then your Column typedef is no
longer useful and you have to rewrite many parts of your code anyway.
You mentioned switching Column into 'std::map<int, double>', but What
if you want to switch the entire Matrix into 'std::map<std::pair<int,
int>, double>' or 'std::map<int, std::map<int, double>>'? Again the
Column typedef is no longer useful. Such switching can happen, but it
cannot be made effortless just by having more typedefs.

The point of this story is that the verbosity of data structures such
as 'std::vector<std::vector<double>>' often encourages us to have more
typedefs that may alleviate the verbosity but don't serve as
meaningful abstractions, that we wouldn't need (all of) them if we had
more concise data structures available.

....also 'auto' was modified exactly because of that:

auto it = matrix.end();

This is great, as long as you can use C++11 (which is, sadly, not
always true).

The feature is in language. There will be constantly diminishing
amount of poor souls who can't use it. Most helpful suggestion for
them is to migrate. What else we can suggest?

It's not always that they don't want to migrate, as I implied with the
word "sadly." There are situations where such syntactic improvements
don't justify the cost of migration.

(And I don't want to have to say that we have to upgrade to a C++11
compiler in order to use a 2D array with such great ease. :D)

in exchange for the nice access syntax 'm[i][j]'.

That is nicest but rarely optimal access syntax.

Is it suboptimal because of the underlying representation of a
vector of vectors, or is there anything fundamental in the access
syntax that prevents the access through it from being optimal?

Even with raw array it is commonly suboptimal. Say we have 'T
m[X][Y];', address of 'm[i][j]' is at 'm + i * sizeof(T[Y]) + j *
sizeof(T)'. That can lose 20% of performance on edge cases (that
rarely matters of course).

The algorithms that traverse multidimensional relations are usually
quite sequential so iterators (or pointers for case of C array) are
usually more optimal (one addition for every increment). Standard
algorithm library for example often performs quite close to optimal
(and takes iterators as rule).

It's not only that it rarely matters, but if we traverse a container
sequentially, as we'd do with an iterator, I don't think many decent
compilers will actually generate those multiplications and additions
verbatim. What I've heard is that vector traversals with indexing and
iterators will generate equally optimal code.

And if you represent a matrix with a vector of rows, how do you do
column-wise traversal, or even a diagonal one with iterators? How do
you traverse an upper triangular matrix? These things are so much
easier with indexing, and an iterator that's tied to the incidental
choice of the internal representation cannot solve them.

IMHO the choice of container should be always implementation
detail. In OOP sense two-dimensional container is just a 1 to M*N
relation. It is not good to expose carriers of relations for any
by-passer to adjust.

I'm not sure if I implied anything opposite.

I was more about the immersion of 'm[i][j]' syntax that is usually
sub-optimal anyway so the only thing for what I would consider it
would be exposing the relation in interface.

In cases that I'm talking about (and that I think we're talking
about), the multidimensional relation is part of the interface.

And my question was more towards comparison with std::vector: even
for numeric matrices, I have seen many people recommend std::vector
but not as many recommend std::valarray. Why is that?

I was just answering your question ("why not valarray?"). My
impression is that it is not generic enough when discussing
arrays. It is not feature-rich enough when discussing computations
with dense matrices.

I haven't found yet anything lacking in std::valarray that std::vector
has that is necessary in implementing numerical vectors or matrices.
On the other hand, std::valarray allows optimization through
expression templates, which the GNU C++ Library seems to have indeed.

My guess is that people are just more familiar with std::vector.

I agree that Boost has many interesting things, but I'm not so sure
about your "the sooner, the better" part. Novices already have too
many things to learn from the beginning, don't they?

OP was puzzled with passing C array 'int xxarray[nsamp][nsamp];' and
pointers. That array-to-pointer is hardest and most error-prone
thing in C. He would have far less problems when he has told that
"we do not use those here" and suggested to pass 'Xxarray& xxarray'
instead that is 'typedef boost::multi_array<int,2> Xxarray'.

I can agree with that; the irregularity of built-in arrays can be
quite confusing for novices. (And some may think they can do the same
thing much more easily in C... :-( )

We'd have to pick one and start from it, of course. What I don't
like is that it may not be very obvious from the beginning which
one is the best, and switching costs.

Then do not switch to best, continue using suboptimal until it
actually hurts. What you want then? Do you want magic container that
profiles itself and switches to perfect container under the hood?

Of course not. I think I'm just complaining about the absence of
anything like boost::multi_array in the standard. ;-) (And std::array
should also be extended to multidimension, which can be easily done
with variadic templates.)

People often use vectors of vectors not because it's optimal or it's a
precise model for the problem (it's often not; it models a jagged
array), but because it's in the standard, readily available and what
they're familiar with. Boost::multi_array or other custom classes can
solve those problems but they're not standard. We need something in
the standard that any beginner can pick and use with relative ease.

I also have lot of usages of C arrays for immutable containers
like ...

T const A[][N] = {{ /*...*/ }, /*...*/ };

Note, that key here is immutability and that compiler calculates
the count of columns thanks to aggregate initialization (no such
luxury for non-aggregates). If things become mutable then C array
is dangerously too low level.

I don't see why immutability is the key for safety here.

If you take and eyeball historic records of effort spent in large
project on issues that involved stomping memory then you start to
see.

Raw dynamic arrays (or raw pointers to such) are usual reason of
stomping memory (besides of being often suboptimal). Containers have
checked at() also rest of the operations and iterators are checked
in debug versions.

Immutability is key since immutable array you can't use for stomping
memory. It is always there (static duration) you do not write to it
and it can be pre-sorted. Search in pre-sorted array is O(log N) if
you use binary search or O(log log N) if you use interpolation
search .... so it is hard to have something even more optimal.

Writing out-of-bounds may certainly have more immediate and evident
abnormal behavior, but reading out-of-bounds is still a bug and can
produce erroneous output. I'm not sure if I'm missing anything here.

--
Seungbeom Kim

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]