Re: A style question on const char* vs. std::string

From:

"James Kanze" <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

10 Dec 2006 18:08:31 -0500

Message-ID:

<1165760581.881589.280800@l12g2000cwl.googlegroups.com>

Al wrote:

Zeljko Vrba wrote:

Apart from that he gave no other argumentation for
why using std::string would be good.
Can you provide some guidelines?

I think you have come across a classical problem: Can a variable have
(or "hold") a non-existent state, or not?

Just a reminder: as you say, this is a classic problem, and in
C++, has a classic solution: Fallible. I first learned about it
from Barton and Nackman (well over ten years ago), and I'm not
convinced that it was particularly new then. Data bases had
been supporting null values for years before that.

(Note that I don't think that this is really too relevant to the
original posters problem. All of his entries held valid
strings. In fact, that could be a legitimate argument for him
to use std::string, instead of char const*.)

With const char*, or any other pointer type, the answer is yes. The
non-existent state is NULL. Notice that this is /different/ from a blank
state, which in this case is the empty string (""). In other words:

"" != NULL

Blank isn't perhaps the correct word, but I think it's
universally acknowledged that an empty string is a valid string
value, and not the same thing as a non-existant string. (Note
that std::string does not accept null pointers in its
constructors---you can't construct a string from a not-string.)

However, with std::string, that state /cannot/ be expressed directly, so
if you need it, you must figure out a work around.

Nothing to figure out. Fallible has been more or less standard
for years. My own (extended) implementation is available at
http://kanze.james.neuf.fr/code-en.html. (As one might guess,
in the Basic subsystem---it's hard to imagine anything more
basic.) I don't think I've written an application in well over
ten years which didn't use it. (The most recent extention, to
allow a more complex error code, actually corrupts the basic
abstraction; it was dictated by a practical need in a recent
application to know why there was no value. I'd suggest not
using it until more or less forced to. There's also a
performance hack to avoid unnecessary copying when dealing with
large types, like std::vector; again, I'd only use it if
necessary.)

You can use
std::string*, for instance, or you could consider the string "__NULL__"
to mean NULL (that's a bad idea, btw).

A very bad idea.

This problem pops up often in the SQL database world, where the
distinction is clear in some systems, but murky in others.

I thought SQL required the distinction. Clearly.

Another person has proposed to use std::map<type_a, std::string>, one of the
arguments being that "it's safer". Namely, referencing an unitialized element
in the map (some_map[blah]) will insert and return a default-constructed value
type. When the value type is const char*, its default is NULL, and this is
what gets returned to the caller, resulting in a crash as soon as the value is
used. Since the mapping is supposed to be 1:1, and referencing an
uninitialized element is a programmer's error, I believe that crashing is a
GOOD thing and an early indication of faulty program.

Yes, I completely agree that silently adding items to a hash/map by
default is not a good idea, and can hide logic defects. The default
should be to throw an exception if the item doesn't exist.

The problem is that there is no one good default. Off hand, I'd
say that an exception rarely the right answer, but aborting
often is. Other times, you'll want to return a maybe value,
with the concept of null, and still other times (in my
experience, rarely), you'll want to insert as std::map does.

The answer, of course, is to use std::map (or
std::unordered_map, in the future) as a low level implementation
class, and wrap it however you want.

The problem that writers of hashes/maps must face is that of the
inefficiency of a double-lookup.

That's a red herring. I usually cache the last value found, so
the double lookup isn't an issue, but even without the cache,
it's rarely an issue. The whole point of using such tables is
to make lookup cheap enough to not affect program throughput.

They must figure an interface that
allows them to avoid this extra cost whenever possible.

At the lowest level, it's probably worth having an interface
which allows avoiding it---std::map<>::find serves the purpose
quite well. At the user level, it's hard to imagine it making a
difference, especially if you cache the last lookup.

--
James Kanze (Gabi Software) email: james.kanze@gmail.com
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]