Re: operator>> for numbers: stream state after failed read

From:

"James Kanze" <james.kanze@gmail.com>

Newsgroups:

comp.std.c++

Date:

Wed, 28 Mar 2007 09:02:53 CST

Message-ID:

<1175072952.413553.41430@p15g2000hsd.googlegroups.com>

On Mar 28, 1:31 am, heples...@gmail.com wrote:

I posted this a week ago on comp.lang.c++, but did not get a
response---I hope to be more successful here.

Summary:
Does the C++ standard require
std::basic_istream<>::operator>>(double&) to leave the input stream
untouched in case of a read failure?

How could it possibly do that? You've got to try to read in
order to have the failure, and once you've gotten the failure,
you've changed the state.

Details:
I noticed an unexpected behavior of operator>>() for numbers (double,
int) when reading from cin. I would like to ask for expert
clarification on whether I am misunderstanding the rules of the game,
or whether my library implementation has a bug. I tested this on g++
4.1.1 under Linux, g++ 3.4.5 MinGW and cxx 7.1 under Tru64 Unix
(behavior there is slightly different than described below). I checked
Josuttis "The C++ Standard Library" and the C++ standard ch 22.2.2 and
27.6.1, but haven't been able to get anything useful out of them.

The problem is as follows: If would like to read sequences like "1 2
+" by first trying to read into a double, and if that fails try to
read into a char, see sample program below (the program reads only a
single number/symbol). This works fine as long as any non-number token
is not a symbol that could be the first symbol in a number, i.e. the
plus sign, the minus sign or the decimal point. If the symbol is one
of those three, the program simply hangs. If I change the locale to,
e.g., Norwegian, it will hang on the decimal comma instead of the
decimal point.

My interpretation is that the operator reads +, -, or ., then tries to
read the next digit, which it does not find, and then raises the
failbit and returns WITHOUT putting +, -, or . back into the input
stream. Should this/must this be so?

I think that that's the way it is supposed to work. The
algorithm is described in some detail in ?22.2.2.1.2, but in
general, if I understand it correctly, all characters that could
be part of a number are first accumulated. (The current draft
seems to have lost an important sentence here; the original
standard says that characters are accumulated as long as they
are "allowed as the next character of an input field of the
conversion specifier", but there's nothing at all in N2134
concerning when a character is accumulated.)

Note that not all implementations actually behave this way,
however. Given "1.0e-x" and reading a double, Sun CC---both
with the Rogue Wave and the STLport--- and VC++ fail (in
accordance with the standard), with the next character read
being x, g++ succeeds (with the next character to be read also
x---whatever happened to the e-?).

A work-around presumably is to read to a string, place it into a
stringstream and then extract from the latter.

That's the usual procedure anytime you have to deal with
variations in the format. In all but the simplest cases, in
fact, I'll use regular expressions to check the format up front;
transactional integrity is a lot easier if you don't do any
assignments before knowing that everything is correct.

--
James Kanze (GABI Software) mailto:james.kanze@gmail.com
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]