Re: Why am I so stupid?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sun, 7 Jun 2009 04:04:12 -0700 (PDT)

Message-ID:

<50cbfacd-4e3b-4de5-a7a3-362059760249@l32g2000vba.googlegroups.com>

On Jun 5, 9:24 pm, none <n...@none.none> wrote:

James Kanze wrote:

What are you using in C?

Just stdio. fopen(), fread(), fclose()... then basically
just marching a char* along the buffer, doing strncmp() and
similar where needed.

The equivalent in C++ would be to read the file into a string,
then march along using iterators. Most of what is in <string.h>
can be done just as well using functions in <algorithm>. This
solution has, of course, the advantage of allowing unlimited
look-ahead. And the disadvantage of only working if the entire
file fits into memory.

A frequent compromize is to read line by line. For line
oriented input (where a "line" is significant in parsing), it
also provides a convenient resynchronization point.

There are a few things which aren't readily supported. Numeric
conversions, for example---there's no real equivalent to stdtod,
for example, which uses iterators. The most "obvious" solution
is to manually find the end of the text you want to convert,
then use the two iterators to create a string, to initialize an
istringstream, and finally read from that. Most of the time,
however, I'll use a regular expression to validate the entire
line, then read the entire line from a single istringstream,
possibly (usually, in fact) using user defined extraction
operators. Alterantively, I've written an iterator based input
streambuf and istream, which supports extracting the current
iterator and setting it, so it's easy to move between istream
and iterators, using which ever is most convenient for the next
step.

One final point: the C++ equivalent of <ctype.h> is found in
<locale>. Unlike the iterator idiom used in <algorithm> (which
is just moderately awkward), it is extremely verbose and awkward
to use. The first thing you should probably do is define a
couple of functional objects (predicates corresponding to the
isxxx functions) which use it, which you can use with the
standard algorithms.

Much of the stdio library is written in highly optimized
assembly, so it's hard to beat the efficiency.

Both stdio and iostream deal with IO, and in any quality
implementation, it is the IO which should be the bottleneck.
The functions in <algorithm> are all templates, which means that
(in practice, today, at least), the compiler has direct access
to the source code, and can inline it when appropriate. On the
whole, in a quality implementation, I would expect the functions
in <algorithm> to outperform those in <string.h> or <stdlib.h>.
(But to be honest, the only one I've measured was sort---which
does outperform qsort on the implementations I"ve tested. But
if performance really is a problem, you might want to measure.)

Maybe it was a mistake to think that moving from that to C++
and STL was the right thing to do. All I ever hear is that
"char*" is the most dangerous thing ever invented and one of
the greatest failures of mankind.

C's handling of arrays, in general, is a bit of a disaster, and
C uses what is probably the worse possible implementation of
string. But C has a long history, and a lot of parsers have
been written in it. So it may have slightly better support in
the standard library for certain types of parsing. Still, I do
a lot of parsing in C++, and I've not found it a problem. At
the start, I did need to design a few tools: CTypeFunctor, as
mentionned above, or the iterator based istream. Or my own
RegularExpression class (decidedly pre-Boost, but I still use it
because it has some features particularly useful for parsing,
it's very fast, using a DFA, and it has support for generating
staticly initialized tables, so you don't have to parse the
regular expression at runtime); the real plus in using C++ for
parsing is that such tools, once written, are an order of
magnitude easier to use than in C.

Ok, fine, so let's all be safe and use string and iostream.
Well, you can't do all the things you used to do with char*.
Some of them have replacements, sort of, and some don't.

And there are other functionalities which weren't present in C.
It's a somewhat different idiom, and in some cases, requires a
slightly different approach. I'll admit that for all but the
most trivial parsing, I use regular expressions (and did even
back in C), and it's a lot easier to have a regular expression
class, which manages all of the necessary memory automatically,
than it was to use regular expressions in C.

This is the key failing of the STL, in my opinion. If you're
going to make something new, and that new thing is intended to
be considered a "standard" that supercedes some old thing,
then it MUST provide all the functionality of the old thing.

Even when that functionality was broken? Surely you don't think
C++ needs something like strtok.

If the attitude toward the STL was "Here's a bunch of new
containers that you can use *in addition to* your familiar old
stdio tools," then great. But that is NOT the attitude at
all. The attitude is that somehow stdio is horrible and
should be avoided at all costs and should be REPLACED by the
STL. Ok, but if that's what you (not you personally, but the
ISO or SGI or whoever the hell thought STL was a good idea)
want, then do the work to make it an actual "replacement."

I have come accross a (very) few problems that were easier to
solve with STL than without. std::map and std::vector come in
handy often. Unfortunately, far more often, STL only makes
simple things unnecessarily difficult.

There's some truth in what you're saying, and the two iterator
idiom is far from ideal, most of the time. But using the
standard library, in C++, is still generally an order of
magnitude easier than using <string.h> and company in C.

Microsoft, for example, takes a lot of criticism for things
like MFC. "Why waste the effort making CString when there is
already std::string?" I can't say that Microsoft has done any
better than STL, but I can say that I understand why they
didn't just jump right on STL and adopt it.

Do you? The main reason they didn't jump on the STL bandwagon
is that STL didn't exist (or at least wasn't known) when MFC was
developed. There are a lot of libraries out there in this
situation, and they all have their own string, vector, map, etc.

Maybe I'm just not an OO guy at heart, I don't know. I'm OK
with an "int" just being a block of bits in memory and not
twelve layers of inheiritance. Yes, I know that an "int" is
still just an "int" in C++. I'm just making a point about the
logic behind OO.

But I WANT to be an OO guy, or at least to give it a chance.
I've been "giving it a chance" for years and what I get in
return, mostly, is consistent disappointment.

Don't worry about OO here. OO is a tool, and not the only one
C++ supports. OO is very useful in what more complex parsers
produce---things like parse trees, for example; it's rather
irrelevant for most parsing issues (which are basically
procedural). And there's practically no OO (at least in the
classical sense) in <algorithm>. If I look at my parser tools,
about the only "OO" component in it is streambuf (and my custom
streambuf's); the rest is still pretty procedural. (Thus, for
example, in RegularExpression, the nodes in the parse tree are
polymorphic, but the parser itself is a classical recursive
descent parser, without the slightest hint of OO, and once I've
got the parse tree, it's rapidly converted into an NFA, which is
then converted, either lazily or by request, into a DFA, neither
of which make the slighest use of OO.)

Where C++ beats C in parsing is not its support for OO, it is
the encapsulation. Thus, in C, my RegularExpression was a
struct, which required explicit initialization and liberation.
And I'd never even found a good means of merging (or'ing)
regular expressions; that had to wait for C++, with classes and
operator overloading. (FWIW, my RegularExpression class
supports things like:

    RegularExpression decimal( "[1-9][0-9]*", 10 ) ;
    RegularExpression octal( "0[0-7]*", 8 ) ;
    RegularExpression hexadecimal( "0[xX][0-9a-fA-F]+", 16 ) ;
    RegularExpression number( decimal | octal | hexadecimal ) ;

Matching number will return 10, 8 or 16, depending on which is
matched, and -1 if there is no match.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34