Re: peek() vs unget(): which is better?

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.lang.c++.moderated

Date:

17 May 2006 11:05:33 -0400

Message-ID:

<1147873052.622734.238390@j33g2000cwa.googlegroups.com>

Carl Barron wrote:

In article <e4bkdg$4$1@news.Stanford.EDU>, Seungbeom Kim
<musiphil@bawi.org> wrote:

I'm writing a simple lexer. It has to determine when to
stop reading for the current token, and it seems to have
basically two options:

(1) peek(), and if valid for the current token, get() and continue
(2) get(), and if not valid for the current token, unget() and
continue

Which is better? Or are they equally good?
It seems to me that (1) makes the code more cluttered and incurs two
unformatted input function per character. But I have read somewhere
that unget() is not guaranteed to work across buffer boundaries, so
I suspect (2) is rather unsafe though simple. Is this correct?

   You are not using formatting so I'd drop to streambuffer
   and since it is sequential in loops an
   std::istreambuf_iterator<char> provides an input iterator
   [one pass thru the input].

That's an interesting idea. I'd probably keep the istream at
the interface level, however, and make sure I set failbit (and
eofbit) in it when appropriate. At least in more or less
generic code, which I expected other people to use -- if the
code is within a project, and I know that only the lexer will be
used to read from the file, it probably isn't worth bothering
about.

My real question, however, is what the istreambuf_iterator buys
you compared to using the streambuf functions sgetc(), sbumpc()
and snextc()? Particularly as you are using the old, C-style
functions from <ctype.h>, which can be passed the results of the
streambuf functions directly, and handle EOF implicitly. (If
you're using std::ctype<char>::is(), it becomes more a question
of taste, since you have to test for end of file separately
anyway. And while I don't particularly like the two iterator
idiom here, it's probably a lot better known amongst "average"
C++ programmers than streambuf is, and the actual names of the
streambuf functions don't make things any easier for those that
don't know it. On the other hand, it's still one extra level of
abstraction which doesn't really do anything.)

   using istreambuf_iterators allow simple for loops to
   implement the loops. The only time you need to put back a
   char is if the loops below exit with begin != end [begin ==
   end means either you have an eof or an input error [bad
   disk etc...]

That's also true for the peek()/get() idiom, if used correctly.
There's an almost 100% correspondence :

     iterator istream streambuf

       *in in.peek() in->sgetc()
       *in ++ in.get() in->sbumpc()
       *++ in --- in->snextc()

Of course, most of the time, you'll probably end up just
incrementing, e.g. ++ in, or ignoring the return value of
in.get() or in->sbumpc(). In this sense, the iterator is
perhaps marginally clearer -- but I still prefer using a
sentinal value for EOF, and not having to test for it
separately.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
                    Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]