Re: Why am I so stupid?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Fri, 5 Jun 2009 04:38:57 -0700 (PDT)

Message-ID:

<0c547570-7dc9-4de1-a723-ae4e1671b6f9@h2g2000yqg.googlegroups.com>

On Jun 5, 10:50 am, p...@informatimago.com (Pascal J. Bourguignon)
wrote:

none <n...@none.none> writes:

James Kanze wrote:

The answer to your problem, of course, is to use istream
for all of your parsing. After extracting the float,
continue extracting using the same istream (istringstream,
etc.).

Ok, you (and others) have sold me on the concept of using
the istringstream for parsing, instead of working directly
with the string.

But istringstream makes the assumption that everything is
separated by whitespace, at least in the case of the >>
operator.

A typical lexer/parser only knows about "tokens." So I
might have this list of tokens:

"(", ")", "+", "-", "*", "/", "sqrt", "sin", "cos", ...

and I might wand to parse an input stream that looks like this:

"(1.2 * -sqrt(7.5e3))"

In other words, I can't rely on tokens to be separated by
whitespace. I need to "peek" before I actually extract, and
I might need to peek at more than one character -- for
example, to match the "sqrt" token.

I understand the theory of parsing an input sequence using a
grammar, and I have one that works beautifully in C using a
recursive-descent approach. I'd like to bring it up to date
by using an istringstream, or whatever STL construct is
appropriate. I'm finding the low-level nuts and bolts, like
tokenization, very difficult.

[...]

But specifically, you shouldn't care whether your source is
represented by a string or a istringstream or a list of
character or whatever.

Yes and no. For some simple parsing jobs, istream contains 90%
of your parser, already implemented; other sequences might not.
For a relatively simple lexer, it might even be reasonable to
define a type Token, and write a >> operator which reads tokens;
I wouldn't recommend this for something like C++ (even without
the preprocessor), but if e.g. his language only uses numbers
(all of which are required to start with a digit), symbols (all
of which must start with an alpha) and a small set of single
character operators or punctuation, it could be an appropriate
solution.

For that matter, it's also possible to define a container or an
accumulator such that your entire parser is invoked by:
    std::copy( std::istream_iterator< Token >( source ),
               std::istream_iterator< Token >(),
               std::back_inserter( parseTree ) ) ;
or
    parseTree = std::accumulate(
                    std::istream_iterator< Token >( source ),
                    std::istream_iterator< Token >(),
                    ParseTree() ) ;
I'll admit that this looks more like obfuscation than anything
else to me, though. (But who knows? Maybe in some specific
cases...)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34