Re: peek() vs unget(): which is better?
In article <e4bkdg$4$1@news.Stanford.EDU>, Seungbeom Kim
<musiphil@bawi.org> wrote:
I'm writing a simple lexer. It has to determine when to stop reading
for the current token, and it seems to have basically two options:
(1) peek(), and if valid for the current token, get() and continue
(2) get(), and if not valid for the current token, unget() and continue
Which is better? Or are they equally good?
It seems to me that (1) makes the code more cluttered and incurs two
unformatted input function per character. But I have read somewhere
that unget() is not guaranteed to work across buffer boundaries, so
I suspect (2) is rather unsafe though simple. Is this correct?
Comments about any other part of the implementation is welcome, too.
Thank you in advance.
You are not using formatting so I'd drop to streambuffer and since it
is sequential in loops an std::istreambuf_iterator<char> provides an
input iterator [one pass thru the input]. using istreambuf_iterators
allow simple for loops to implement the loops. The only time you need
to
put back a char is if the loops below exit with begin != end [begin ==
end means either you have an eof or an input error [bad disk etc...]
for example:
#include <streambuf>
#include <string>
#include <iterator>
#include <cctype>
const int STRING_TOKEN = 256;
const int INT_TOKEN = 257;
const int EOF_TOKEN = 258;
int lexer(std::streambuf *sb,std::string &value)
{
value.clear();
std::istreambuf_iterator<char> begin(sb),end;
// zkip initial whitespace
while(begin != end && std::isspace((unsigned int)(*begin)))
++begin;
if(begin != end)
{
if(std::isalpha((unsigned int)(*begin)))
{
value += *begin;
for(++begin;begin!=end && std::isalnum((unsigned
int)(*begin));++begin)
value += *begin;
if(begin!=end) // put invalid char back
sb->sungetc();
return STRING_TOKEN;
}
else if(std::isdigit((unsigned int)(*begin)))
{
value += *begin;
for(++begin;begin!=end && std::isdigit((unsigned
int)(*begin));++begin)
value += *begin;
if(begin!=end) // put invalid char back
sb->sungetc();
return INT_TOKEN;
}
else
return *begin; // other chars +- etc.
}
return EOF_TOKEN; // end of input
}
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]