Re: Overhead of subscript operator for STL maps

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sat, 18 Oct 2008 00:26:20 -0700 (PDT)

Message-ID:

<57b41bc6-3f9e-4ec3-a823-94b5f777bd6f@y21g2000hsf.googlegroups.com>

On Oct 18, 1:59 am, Stephen Horne <sh006d3...@blueyonder.co.uk> wrote:

On Fri, 17 Oct 2008 17:05:27 GMT, Juha Nieminen

<nos...@thanks.invalid> wrote:

Stephen Horne wrote:

I realise that scripting languages do it, but they get to
handle the [] differently depending on whether its on the
left side of an assignment or not. They still complain if
you try to read a non-existent key.

Actually in many languages (such as for example PHP)
relational maps work exactly that way. That is, you add an
element to the map by "indexing" it with the key and
assigning the element. This is a very common idiom eg. in
PHP.

Just a few general comments, since I basically agree with your
main point.

I didn't know that, and on my first read of this I thought I'd
overgeneralised a principle from basically just Python to
scripting languages more generally. On second read, you are
saying precisely what I said - PHP is a scripting language,
and it is doing what I said is characteristic of scripting
languages.

I don't think that the difference is some much scripting or not,
but rather the fact that the languages in question have a
radically different type system than C++: either everything is
the same type, or (e.g. lisp) all typing is dynamic, and there
is a null type of some sort. Typically, in such cases, you
don't have to declare a variable; you just use it, and if it
doesn't already exist, it is created with a default value (empty
string in AWK, probably the null value in lisp, etc.).

Maybe I worded my post awkwardly. To clarify...

C++ allows key:data insertion using "container [key] = data;".
That surprised me. I realise that scripting languages *also*
do "container [key] = data;", but they get to handle the
subscripting differently.

Basically, scripting languages don't do a two stage process of
insert a key:default in the [], then overwrite the default
with the data in the assign.

I suspect that most do, although many implementations are
possible.

They merge the subscripting and assign into a single step.
For example, Python has separate magic methods for overriding
a subscripted read and a subscripted write. When C++ handles
the subscripting operation it doesn't even know that there is
an assign.

It could. Operator[] could return a proxy.

For example...

  std::map<int,int> mymap;
  int var;

  var = mymap [1];

Did the programmer really intend for var to get a random junk
value that was also assigned into mymap in a key=1:data=junk
pair?

It's not random junk. It's mapped_type(). The "default
initialized" object. In the case of an int, it's 0.

Python would know better since the assignment is on the wrong
side of the assignment, I assume PHP would as well, but C++
cannot since when it handles the operator[], the method
doesn't know how the returned reference will be used. It
returns a reference that might be used for a read or a write,
or which may not be used at all.

Again, it's not at all certain that other languages use this
information, any more than C++ does. And you can do it in C++
as well by means of a proxy. The difference is more closely
related to the stricter type system, and the fact that you
always have to declare variables before use. In C++, the
semantics of operator[] in std::map are inconsistent with the
way the basic language works. C++ is designed for robustness;
languages like Python or PHP for rapid development (of less
critical applications).

The two normal variants of operator[] aren't read and write -
they are const and non-const as follows...

const data_t& container::operator[] const { ... }
data_t& container::operator[] { ... }

The first shouldn't modify the container, and returns a
reference that will normally prevent the referenced value from
being modified (with various get-out clauses - mutable,
const_cast, ...). This version is called whenever the
container is considered const.

Independently of all that: any "normal" operator[] should be
callable on const objects. And the non-const version shouldn't
have different semantics from the const one.

There's also the point that if you overload an operator, it
should behave in some way like the built-in
operator---otherwise, it's operator overload abuse. The
built-in operator[] never adds elements to an array.

[...]

It might be "obvious" to you because you are used to think
like that. However, in many languages (such as PHP) it's
obvious that you are building a relational map by indexing
it. That is, when you say:

myMap[key] = value;

First off, my criticism of that is that there's already a way
to express that intent in C++ - the insert method.

The semantics aren't the same. std::map<>::insert corresponds
to an insertion. It will never overwrite an existing value.
And it has a return code which indicates whether the insertion
took place or not.

When you already have a way to express that intent, why make
the intent of another notation less clear (and reduce the
opportunities for error checking to catch problems) by mixing
the two intents together.

insert cannot be used to access an existing key. Why allow
operator[] the dual role of both accessing existing keys and
inserting new ones?

Because in some contexts, that's what you want?

[...]

Secondly, it's not something I simply got used through using
some other language at all, unless it was an old pre-standard
version of C++. I can't even name a language that uses the
semantics I suggest. Ada, Modula 2, Pascal, C, Basic and
Assembler don't have standard associative containers to the
best of my memory.

They all do have a built-in operator[], however, which never
inserts. With the exception of Basic and Assembler, they all
have string typing, and require variables to be declared.

[...]

std::map has that exact same idea (although it's not as
flexible as PHP because the type of the key and the value are
fixed). As exemplified by Erik, this can be quite handy in
some cases, for example because you can write things like:

words[word]++;

One single command adds a new 'word' to the map and
increments its value.

You won't get the result you expect, or at least if you do
it's only a fluke.

Yes you will.

Try the following...

#include <map>
#include <iostream>

void Garbage ()
{
  char garbage [1024];
  for (int i = 0; i < 1024; ++i) garbage [i] = 0x55;
}

void Test ()
{
  int a;

  std::cout << a << std::endl;
}

int main()
{
  Garbage ();
  Test ();

  return 0;
};

What does that have to do with it? I don't see any std::map,
and I don't see any use of int which ressembles that of
std::map. The equivalent would be something as if Test did:

std::cout << int() << std::endl ;

And that's guaranteed to output 0.

What result do you expect from Test? If you say zero, you're
wrong. The default constructor for int is trivial - it does
nothing.

From the C++ standard: "To default-initialize an object of type
T means: [...] -- otherwise [POD, not an array], the storage for
the object is zero initialized." And "An object whose
initializer is an empty set of parentheses, i.e. (), shall be
default-initialized." And finally, the specification of T&
operator[]( key_type const& x ) in map: "Returns
(*((insert(make_pair(x, T()))).first)).second."

Note the T() in that last expression.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34