Re: Overhead of subscript operator for STL maps
On Oct 18, 12:22 pm, Stephen Horne <sh006d3...@blueyonder.co.uk>
wrote:
On Sat, 18 Oct 2008 00:26:20 -0700 (PDT), James Kanze
<james.ka...@gmail.com> wrote:
[...]
but rather the fact that the languages in question have a
radically different type system than C++: either everything is
the same type, or (e.g. lisp) all typing is dynamic, and there
is a null type of some sort. Typically, in such cases, you
don't have to declare a variable; you just use it, and if it
doesn't already exist, it is created with a default value (empty
string in AWK, probably the null value in lisp, etc.).
I *think* I get the point - that you shouldn't magically
create an item in a container if you wouldn't magically create
a plain variable?
That's more or less it. In AWK, something like
a = 43
or even
x = a
is legal, even if the variable a has never been been mentionned
before. So it makes perfect sense, and fits perfectly into the
language, to be able to do the same thing with something like
"a[i]". C++, however, is a different language, and what fits
perfectly in AWK (or perl, or PHP, or whatever) doesn't
necessarily fit into C++.
Basically, scripting languages don't do a two stage process
of insert a key:default in the [], then overwrite the
default with the data in the assign.
I suspect that most do, although many implementations are
possible.
I'm confident that Python (in effect - and probably in
practice) transforms the AST during bytecode compilation. From
the described semantics of PHP, it sounds like it does the
same. Beyond that, I'm possibly overgeneralising, but I still
think the AST transform will be more common than proxies.
How they implement it is more or less irrelevant. The point is
that (at least in AWK or perl---I don't know the others) because
of the point mentionned above, that you can use a variable which
hasn't been defined in just about any context, you must have a
perfectly acceptable and well defined "default" value to use.
About the only time it might make a difference is if you know
you'll never actually "insert" the default value, use a test for
this default value to determine that the element isn't in the
map, and have a lot of misses: the map will occupy a lot more
memory than necessary.
The reason is to support overriding of subscripting for user
containers. Separate override methods for read and write make
a lot of sense.
Certainly, in certain contexts. Given the overall expression
syntax of C++, I don't think it could reasonably be made to
work, but it's certainly convenient in languages where it can be
made to work.
Returning a proxy, OTOH, would be a major pain in a scripting
language given the type system (you're trying to put the
reference into the container for a write - not to copy the
object over some previous object while leaving the reference
intact) and the resulting efficiency issues.
Returning a proxy in C++ isn't a perfect solution; you can't
really handle cases like a[ i ].x (either lvalue or rvalue use)
very well, and it requires a lot of boilerplate to handle things
like a[ i ] *= 3. But it's "good enough" most of the time.
They merge the subscripting and assign into a single step.
For example, Python has separate magic methods for
overriding a subscripted read and a subscripted write. When
C++ handles the subscripting operation it doesn't even know
that there is an assign.
It could. Operator[] could return a proxy.
Yes, but even then operator[] doesn't know what will happen to
the proxy - it's still a two step process. In fact, the
reference that it does return *is* a (trivial) proxy.
Yes:-). What the classical proxy does is allow "trapping"
specific lvalue uses, and handling them specially. What we'd
really like is for the proxy to act as a smart reference, with a
user defined conversion for lvalue to rvalue use. Since we
can't overload operator.(), we can't do this, and even with
operator.(), it would require a lot of boilerplating (which most
proxies don't do) to cover *all* lvalue uses. (Some lvalue
uses, of course, like unary & or binding to a reference, you
usually don't want to support, since doing so would mean that
you do lose control of all of the references to the object.)
[...]
C++ is designed for robustness;
That principle took second place to efficiency concerns or
following a C related tradition more than once, but in
general, yes.
Yes. C++ represents a lot of compromizes.
The first shouldn't modify the container, and returns a
reference that will normally prevent the referenced value from
being modified (with various get-out clauses - mutable,
const_cast, ...). This version is called whenever the
container is considered const.
Independently of all that: any "normal" operator[] should be
callable on const objects.
I don't think I understand the point here.
A "normal" (ie non-const) operator[] cannot be callable for
const objects because it is non-const, because it needs to
return a non-const reference, and so on.
If you're simply saying that...
And the non-const version shouldn't
have different semantics from the const one.
... the two versions should be as similar as possible, with
the constness of the returned reference as the only
difference, I think I agree.
What I meant is that if you supply a user operator[], you should
ensure (somehow) that it is usable on a const object, since the
built-in operator[] is usable on const objects. The usual means
would be to supply two operator[], one const, and one not. In
which case, they should have the same semantics---having one
which e.g. raises an exception if the indexed element isn't
there, and the other which inserts it, just doesn't go.
There's also the point that if you overload an operator, it
should behave in some way like the built-in
operator---otherwise, it's operator overload abuse. The
built-in operator[] never adds elements to an array.
Yes, but the enemy ;-) can site precedent on this one.
There's no "enemy". There are legitimate arguments for both
sides. In this case, I happen to think that the arguments
against operator[] in std::map, at least with its current
semantics, are stronger. But I'll still point out arguments the
other way if I see them. IMHO, it's not a killer, because
std::map doesn't depend on operator[] for its usability. If you
don't like the semantics of its operator[] (and I don't, except
in special cases), then just don't use it (which I don't, except
in special cases).
In Python, subscripting a dictionary can insert a new key, but
subscripting a list will not insert a new index - not even if
you subscript to the .length () position. There's a history of
treating associative different to sequence, key subscripting
different to position subscripting.
In Python, every type has a default constructor (I think), and
there is no const (I think). Which makes the situation
different enough from C++ that the analogy doesn't really hold.
The semantics aren't the same. std::map<>::insert
corresponds to an insertion. It will never overwrite an
existing value. And it has a return code which indicates
whether the insertion took place or not.
IOW there's a third reasonable intent, which still shouldn't
be confounded with read. My containers call it "write", though
it might equally be called "set".
I think that the real problem is that at an application level,
things like std::map are used to implement various idioms. In
some, like data bases, distinguishing insertion from update is
fundamental. (They're two different verbs in SQL.) In others,
it's not, and in a few, not distinguishing them at all is very
convenient.
When you already have a way to express that intent, why
make the intent of another notation less clear (and reduce
the opportunities for error checking to catch problems) by
mixing the two intents together.
insert cannot be used to access an existing key. Why allow
operator[] the dual role of both accessing existing keys
and inserting new ones?
Because in some contexts, that's what you want?
In those relatively unusual cases (compared with always-reads,
always-replaces and always-inserts), you should express that
particular intent. It's more robust that way.
I more or less agree. In general, if it weren't possible to
make the distinction, std::map would be more or less unusable in
a lot (probably most) cases. But that's not really the case.
As I said, just wrap std::map in an application specific class
(which is a good idea regardless), and define operator[] of that
class to provide the appropriate semantics for its particular
use, using find, insert and erase. The "good" thing about
operator[] is that you never have to use it, and that you rarely
even are tempted to use it.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34