Re: A style question on const char* vs. std::string

From:

"Greg Herlihy" <greghe@pacbell.net>

Newsgroups:

comp.lang.c++.moderated

Date:

10 Dec 2006 13:48:04 -0500

Message-ID:

<1165741475.991443.159770@n67g2000cwd.googlegroups.com>

Zeljko Vrba wrote:

Hello! Consider a hypothetical situation:

std::map<type_a, const char*> some_map;

type_a is an enum type. The some_map is filled with strings on program
startup; it is supposed to be 1:1 mapping for each enum element. I am aware
that this is better done with a static array of strings or
boost::array. However, this is not at the core of the question, so please bear
with me.

Another person has proposed to use std::map<type_a, std::string>, one of the
arguments being that "it's safer". Namely, referencing an unitialized element
in the map (some_map[blah]) will insert and return a default-constructed value
type. When the value type is const char*, its default is NULL, and this is
what gets returned to the caller, resulting in a crash as soon as the value is
used. Since the mapping is supposed to be 1:1, and referencing an
uninitialized element is a programmer's error, I believe that crashing is a
GOOD thing and an early indication of faulty program.

Your colleague's argument is sound. Using std::strings instead of C
string pointers for values in the enum map would in fact make the
program both safer and more maintainable as well. Let's consider the
current program. If this program were to crash by dereferencing a NULL
pointer that it found in the enum map (a crash which is certainly not a
portable behavior by any means), then the bug would have to be in the
line of code that dereferenced the pointer - and not in the code that
added the NULL pointer to the map in the first place. We can be
confident in this diagnosis simply by reasoning that if NULL were not a
legal value then the program would not still be running at the point
where it crashed. Instead, an assert in the code that added the value
would have tested for a NULL pointer value (if NULL pointers were not
legal) and would have aborted the program before the NULL value could
have been added to the map. Since no such assert exists, we can
reasonably conclude that a NULL pointer must be a valid as a value in
the enum map.

Good programming is more than simply writing instructions for a machine
to execute, programming involves communicating to other programmers the
intent of those instructions. Without knowing the intended behavior of
a program a maintainenace programmer will not know the correct fix for
a bug iin the program. A great deal of a programmer's intent can be
communicated simply by adhereing to a set programming conventions.

One of those conventions is that undefined behavior is never a
deliberate programming aim - it is only ever an error. So to fix a
program with undefined behavior requires changing that code that
directly leads to the undefined behavior - not to fix code somewhere
else in the program. Otherwise, how would amaintainance programmer know
to look there - wherever that somewhere else is supposed to be. In
particular, it is not a reasonable programming policy that it is OK for
code in one part of the program to have undefined behavior - just as
long as the purpose of this undefined behavior is to "teach" some other
code elsewhere in the same program "a lesson" about what it means to be
correct.

In short, the only sensible way to avoid the chaos of code policing
other code (but not itself) is to have each module of a program be
responsible for testing itself for correctness. In this case, the
appropriate point to test whether a map entry value is valid or not -
is while the code responsible for adding the value to the map is
executing - and not at some arbitrary distance (whether measured in
time or lines of code executed) beyond that point.

Now, the question that your colleague has raised is essentially this
one: if a NULL pointer is not a legal value in this enum map, then why
allow NULL pointers as potential map values in the first place? In
other words, each one of the failure states that a NULL pointer crash
"detects" - is a failure state that a std::string replacement would
eliminate completely. To illustrate this point, I am providing the
following graphic:

              Comparison of Failure States
               for Enum Map String Values

                C-string std::string

    NULL (0) X

    Empty ("") X X

As the diagram above makes clear: a C string pointer when used as the
enum map value has just as many illegal empty string values as a
std::string has. A C string pointer then adds a like number of illegal
NULL pointer values - illegal values which a std::string does not
support. In fact, having fewer failure states is the reason why a map
with std::string values is "safer" than a map whose values are
character pointers.

Now it's likely that someone may object to this analysis: the objection
being that counting failure states does not make sufficient allowance
for how "likely" a particular failure state may be for a program to
reach. So even though a C string pointer has twice as many failure
states as a std::string - the chance of reaching any one of those
failure states is less than half as "likely" as any one of std::string
failure states.

First, I would note that all failure states are equally bad for a
program to enter - so how likely it is to enter a failure state does
nothing to help a program that has failed. But more importantly - as I
noted above - a program must enforce its own constraints itself.
Because any program fails to do so, will not communicate those
constraints to future maintenance programmers effectively. Therefore
the program must check values as they are added to the enum map - no
matter their type. And since an empty std::string is just as easy to
test for as a NULL pointer value (not to forget the pointer-to-an-empty
string value that the NULL pointer crash fails to detect), we can
expect that the likelihood of either failure to be about the same. So
with all other factors being equal, the better program is the one with
fewer ways that things can go wrong with it - and that would be the one
your colleague suggested.

Greg

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]