Re: Issue, #309, reposted after mis-posting to comp.lang.c++

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.std.c++

Date:

Thu, 14 Sep 2006 13:01:05 CST

Message-ID:

<1158244405.376163.11200@i42g2000cwa.googlegroups.com>

Alberto Ganesh Barbati wrote:

jimreesma@gmail.com ha scritto:

In the course of writing software for commercial use, I
constructed std::ifstream's based on user-supplied pathnames
on typical POSIX systems.

It was expected that some files that opened successfully
might not read successfully -- such as a pathname which
actually refered to a directory. Intuitively, I expected
the streambuffer underflow() code to throw an exception in
this situation, and recent implementations of libstdc++'s
basic_filebuf do just that (as well as many of my own custom
streambufs).

Hmmm... If I open a directory as it were a file, I expect the
operation to fail immediately, before any read operation
attempt. For example, on Win32 the stream gets failbit set in
the ctor. I don't know POSIX systems very well, but I would
expect the same. Apparently, I'm wrong about this...

Yep. The open normally works. The read may or may not work; if
the file system is mounted locally, you can read a directory
just like any other file.

Anyway, your problematic scenario could still occur, for
example when opening a file of size 0.

Not really. If I understand correctly, his problem occurs
because the read returns an hard error, and not 0 bytes read.

The obvious solution in his case is simply to do a peek()
immediately after the open, and then check badbit. Supposing,
of course, that the implementation of filebuf handles this
correctly. (Of course, it still means that he can read data
from an open directory. He'll probably get a format error
fairly quickly, of course, if he expects any predetermined
format.)

BTW, underflow() can fail by either throwing (as suggested by
footnote 275) or simply return traits::eof().

I think the intended behavior is for it to throw if it
encounters an error, and to only return EOF if it encounters end
of file.

So you should not *expect* it to throw. It might occur on a
particular implementation and in particular cases but it's not
required by the standard.

It's true that an implementation is not required to test for
possible hardware errors. His problem, however, if I understand
it correctly, is that systems are detecting the possible
hardware errors, but reporting them differently.

I also intuitively expected that the istream code would
convert these exceptions to the "badbit' set on the stream
object, because I had not requested exceptions. I refer
to 27.6.1.1. P4.

Notice that if failbit is set, then every operation fails
immediately, without having the chance to set badbit. So, for
example, on Win32 badbit won't be set.

Moreover, if underflow() fails without throwing (and we saw
that this case could actually happen), then badbit won't be
set anyway, regardless of issue #309.

How true. In fact, the standard makes no guarantee as to
whether we can distinguish hard errors from end of file or not.

Again, I think his problem is that the open succeeds, and then
the first read fails with a hard error. This is standard
behavior under Unix when attempting to use the normal reads on a
directory on a file system which is remote mounted. (Arguably,
what is broken here is Unix, and not C++. An open for read
succeeds on a "file" which cannot be read.)

All he's asking for, I think, is that when the system detects a
hard error during a physical read, the behavior of iostreams be
consistent. The current situation would seem to be that the
behavior depends on whether the error is first detected in the
constructor of the sentry object, or later in the actual
operator>> code, and that in the first case, the behavior isn't
consistant accross implementations.

I agree with his argument. Globally, I would expect that
anytime the system detects a hard error (manifested by the
system request read returning -1 under Unix), badbit be set),
filebuf would raise an exception, and anytime an exception
occurs during input---any type of input---badbit be set.

However, this was not the case on at least two
implementations -- if the first thing I did with an istream
was call operator>>( T& ) for T among the basic arithmetic
types and std::string. Looking further I found that the
sentry's constructor was invoking the exception when it
pre-scanned for whitespace, and the extractor function
(operator>>()) was not catching exceptions in this
situation.

Hmmm... if the sentry is trying to parse the whitespaces, then
clearly failbit was not set... However the sentry will set
failbit | eofbit and you can check failure with fail() after
that. No need to check badbit.

Except that he wants to tread badbit differently. If he gets
failbit on the first input, that means an empty file, or a
format error (depending on eofbit). If he gets badbit on the
first input, that probably means he opened a file he shouldn't
have.

So, I was in a situation where setting 'noskipws' would
change the istream's behavior even though no characters
(whitespace or not) could ever be successfully read.

Also, calling .peek() on the istream before calling the
extractor() changed the behavior (.peek() had the effect of
setting the badbit ahead of time).

I can't give you an answer, but let me ask you this: why are
you worried about badbit? My experience is that checking
fail() (that is either failbit or badbit) is the right thing
to do 99.9% of the times.

But isn't that experience conditionned by the fact that you
can't reliably do more? Wouldn't it be better if you handled
hard read errors differently from end of file? Consider a
server, reading a configuration file. If for some reason there
is a read error on the disk, do you want it to start, without
any error message, but with the wrong configuration, or do you
want it to abort, signaling a read error in the configuration
file.

A lot depends on context, but in general, the more information,
the better: it is easier to ignore excess information than to
process information you don't have.

I don't really bother about which of the two bits is set (most
of the times it's either failbit or both). In fact, as I
showed you above, using bad() usually means relying on
implementation-defined behaviour, so the code would be
unportable regardless of issue #309.

Perhaps you could motivate your concerns by providing a use
case where checking for badbit rather than failbit really can
make a difference. To be convincing, it would be better if you
provide an example that does not depend on
implementation-defined behaviour.

Since any detection of hard errors is to some degree system
dependant, some implementation-defined behavior is bound to be
involved. In general, in this sort of situation, the intent of
implementation-defined behavior isn't that the implementation do
just anything; the intent is that it do the most it can, without
the standard requiring something impossible for certain
implementations. After, it is a quality of implementation
problem.

And I find it hard to imagine any serious software where you
don't distinguish badbit (although it's bloody hard to
test---how do you force a read error on the disk). If you
cannot read the input, you don't want to just silently ignore
the fact, and say that everything went right. (I've actually
had a collegue loose data because a program didn't test badbit
when writing. The program was a typical Unix filter program,
copying the input file to standard out, with a little
transformation. The disk was full, but the program said that
the copy was fine, so he deleted the input file, and moved the
output over to replace it.)

--
James Kanze GABI Software
Conseils en informatique orient9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S9mard, 78210 St.-Cyr-l'cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]