Re: Why do you deserve a better IO library

From:

"kanze" <kanze@gabi-soft.fr>

Newsgroups:

comp.lang.c++.moderated

Date:

6 Jun 2006 17:54:22 -0400

Message-ID:

<1149597242.074962.16220@j55g2000cwa.googlegroups.com>

psyko wrote:

In this article I'll try to summarize the reasons that lead me
to dislike IOStream as it is now, and to hanker for a better
IO library for C++. The arguments are presented are of two
kinds:
- Cosmetic Problems: has to do with bad names, bad conventions, ...
etc. Basically, these
can be solved without major rework.
- Design Problems: are the fundamental problems with the design of
IOStream. Solving
these may mean coming up with a whole new library.

IOStream's isn't without its problems, but it's a considerable
improvement (easier to use, more intuitive, much more flexible,
and much, much safer) over what we had in C. Realistically,
however, I doubt that there will be any significant changes in
it, for the simple reason that this sort of streamed IO has
pratically lost most relevance for most programs: human readable
IO is mostly via a GUI, and iostreams are not really the answer
to binary IO. If I look at any of my recent applications,
iostream's has been limited to log files (for which it is about
the only solution, but needs to be wrapped), and ostringstream,
to format various types. And getline for reading configuration
files (which are then parsed by a number of different parsing
tools, including occasional use of istringstream to convert
individual values into the correct types).

The arguments are numbred for easy reference.

It's probably worth noting that a number of your points (not all
of them) reflect more your misunderstanding than any defect
(except maybe lack of good documentation) in iostream itself.
The naming conventions in iostream aren't always ideal -- in the
case of the streambuf interface, they are frankly horrible --
but that's no excuse for not reading the documentation.

Cosmetic Problems:
^^^^^^^^^^^^^^^^^^

1- Very complicated (and overlapping) state access functions,
all with strange names: rdstate(), clear(), setstate().

Historical reasons, of course. At least partially -- why the
committee didn't adopt some sort of coherent naming convention
for the functions it added is beyond me.

2- fail() tests for both eofbit (which is a rather expected
cause of failure) and failbit (which less expected). inorder
test for fail alone you have to come up with
rdstate()&failbit.

That's simply false. fail() is one of the rare well-named
functions; it returns true if and only if a previous IO attempt
failed. It does NOT return true if only eofbit is set (but you
almost never want to test eofbit). Failbit is a bit of a
misnomer, in that it doesn't cover all cases of failure --
failure is either failbit or badbit. But it's not a real
problem, because you practically never use the names for the
actual bits (unless you are writing a >> or a << operator which
uses the streambuf directly, in which case, you have to set the
correct bits in case of error).

The real problem is in the other direction: unlike fail, good()
does take eofbit into consideration, which means that it is
practically unusable. Worse, it is NOT the opposite of fail()
or of bad(), as the name might suggest. In practice, of course,
you almost never call either of these two.

Still, I would agree that the error handling needs some cleaning
up. At an even deeper level; there is a fundamental problem in
that the streambuf interface doesn't provide for different types
of returns: end of file, error, etc. In practice, today, this
is easy to work around -- an error in the streambuf triggers an
exception, and an exception causes badbit to be set. It would
also be nice if there were separate fail bits for failure due to
eof, and failure due to a format error -- as it stands, there
are certain ambiguous cases where you cannot distinguish between
the two.

4- Input operations set eofbit _and_ failbit on EOF, which
makes it impossible to make stream classes throw only when
operation fails because of stream error, and not because of
EOF.

In practice, I can't think of a reasonable case where you'd want
an exception in case of fail. About the only case where I would
find exceptions reasonable is when badbit is set. (Note that
the failbit is only set in the case of "expected" failure: end
of file, or an illegal format in a text file. Any real IO
errors should cause badbit to be set. Because the streambuf
interface doesn't provide any immediate way of distinguishing
between an IO error and end of file, however, and the fact that
exceptions are a relatively new feature -- newer that iostream,
at any rate -- filebuf in a lot of implementations will still
never throw an exception. So badbit will never be set on input,
and you have absolutely no way what so ever of distinguishing
between a real end of file, and a read error on the disk.)

3- The use of implicit convertion to void* (or to bool) does
more harm than good. It is not really clear what are we
testing for in 'if(cin) { ... }'

You're testing whether all of the previous input or output
succeeded or not. I agree that requiring separation between the
actual IO and the test for success would probably result in
cleaner code, but there seems to be a lot of resistence to the
extra verbiage. So while I would prefer that the standard idiom
be something like:

std::cin >> someInt ;
if ( cin.succeeded() ) { ... }

most people seem to prefer to put it all in the if.

4- Unformatted input functions handle exceptions specially (I
hate special cases)

Explain? I'm not aware of the slightest difference between
unformatted and formatted functions with regards to error
handling and exceptions.

5- Format flag manipulation is unnecessarily complicated by
strange names (and too much overlap in functionality):
setf(f), setf(f, m), unsetf(f), flags(), flags(f), (even with
io manipulators) setiosflags(f), resetiosflags()

The format flags are designed to be basic tools; you don't
typically use any of the standard manipulators (except maybe
std::setw, or in quicky test programs); you use custom,
application specific manipulators. Manipulators are text
markup, and anyone having to deal with text formatting knows
that logical markup is to be preferred by far to physical
markup. (If you're writing a web page, you don't use
<i>...</i>, do you?)

As for the overlap, it is more or less natural, as different
sets of functions address different use cases: setf(f)/unsetf(f)
for specific boolean flags, setf(f,m) for larger fields, and
flags for saving and restoring the state. On the whole, it
makes the class easier to use.

6- All these basic_ prefixes look bad. Why don't we simply
have template versions named
std::stream<T=char> whith a default template parameter set to char?

Historical reasons. I think it's pretty clear now that
templating the iostream stuff was a bad idea. (But your
solution means that there would be no simple name for wistream,
etc.)

7- To say nothing about members of streambuf: putback, unget
get, eback, gptr, egptr, setp, pbase, pptr, gbump, uflow,
snextc, sbumpc, sgetc, sputc, sputn, showmanyc, pbackfail...
etc

The function names in streambuf *are* pretty awful.

Design Problems:
^^^^^^^^^^^^^^^^
1- The simple fact of #including <iostream> incur some
overhead on the generated code. Can you bear it?

On the generated code? What? All <iostream> should contain is
a couple of forward declarations and some extern's.

2- std::cout is an instance of std::ostream, and std::ostreams
have a seekp() member that allows to traverse the character
sequence. But it is meaningless to seekp() forward with
std::cout.

That pretty much depends on what cout is connected to, doesn't
it? And of course, you don't know that until runtime.

     So seekp() isn't defined for the standard output? Ok! then
either:
        - std::cout shouldn't be an instance of std::ostream, or
        - std::ostream shouldn't contain a seekp() member
    Which lead (respectively) to tow questions:
        - std::cout should be an instance of which class then?
        - Then where to put seekp()?

In general, supporting seek in a class named stream is a hack,
present for historical reasons. Once you accept that seek is
supported, however, you have to accept the fact that it may
fail, because you don't know what actual device your input or
output is connected to.

A worse problem is the fact is the different definition of seek
depending on whether the file is text or binary. Again,
necessary for historical reasons.

3- Uppon failure, IOStream (partiularily file stream classes)
throw instances of ios_base::failure.

Since when? I've never seen an exception from iostreams. Ever.

But this isn't much of help. If try to open a file and
get an ios_base::failure, what can you do? You can't output
the result of the what() member because its content is
undefined and surley not in the "right" natural language.
Indeed, you have absolutly no idea of _what_ happend:
file_not_found? permission_access_error? bad_filename?
system_error? disk_error? would have been more informative
things to throw. Providing an error report in the form of a
simple character string (in some arbitrary language) is simply
not enough.

Propose an alternative. That can be implemented on all possible
systems.

I think this is the crux of the problem here. IOstreams error
reporting stinks; there's no doubt about it. But what can you
require, portably?

5- The use of virtual inheritance is not justified in my
opinion. Did anyone ever handled a stream throught a
basic_ios<>* pointer? Aggregation would have been a better
option.

Sorry, the virtual inheritance is both necessary and natural
here. And I'd phrase the question differently: have you ever
seen an application that didn't have some code which manipulated
ios*'s? At the very least, you need it in you IOSave class, or
whatever you call it; it also typically occurs in some of your
manipulators.

4- streambuf is a kind of "super class" that has members for
everything (input, output, seek in both directions ... etc),
even though most of its instances can't actually support all
the operations.

Agreed. It probably would have been better if there had been
several different interfaces, with mixin's used to create the
concrete instances. Of course, that would have meant a lot of
multiple inheritance. (But see my comments on the next
question.)

-5 streambuf is actually mixing two compleatly unrelated
concepts: The concept of a buffer for IO, and the concept of
an external source/sink of data, which makes extending the
library to support new kind of stream unnecessarily difficult.

Three, not two: the standard filebuf also handles code
conversion. And you're right, it's conceptually wrong.

It's important, however, to put at least one thing in context.
If I were designing streambuf today, for today's machines, it
wouldn't have buffering implicit in the base class, and filebuf
wouldn't have code conversion. Rather, both would be handled,
as necessary by filtering streambufs. I frequently chain four
or five filtering unbuffered streambufs, despite the fact that
this means four or five virtual function calls for every
character extracted or inserted into the buffer. Without
performance problems. On today's machines -- on the machines I
used 15 or 20 years ago, I think I'd have definitely seen the
slow down. One of the design requirements in the early days was
that reading or writing a single character must be a simple,
inlinable function. Which imposes buffering in the base class,
whether you need it or not.

Epilogue:
^^^^^^^^^

Design problems are (of course) more important than cosmetic
ones. And I think the most important problem is the fact that
instances don't support all the operations exported by
classes.

That is, I fear, inevitable. How can it be otherwise if the
support for the operation depends on user input. If I specify
"/dev/tty" as a filename, for example, seek doesn't work. But
there's absolutely no way to know at compile time whether I will
specify this filename or not.

For example, if I write a prototype like:

void encrypt(std::istream&, std::ostream&);

Can encrypt() seek the streams?

Can it even write binary data to the ostream? There's
certainly more justification for separate stream types for
binary and text data than for seekable/non-seekable. AND... the
program generally can know whether the file should be opened in
binary or not.

If you want to extract all of these features into the
inheritance tree, however, you're going to end up with something
extremely complex. And that still requires some form of
run-time checking -- even if you change the encrypt interface to
take the names of files, and it is responsible for opening them
(in order to ensure that the output file is binary, for
example), it will still have to check dynamically whether
seeking works or not, because this depends on the filename I
give.

Given that it is declared to take std::streams and that
std::streams export seek members, the answer would nomrally be
'yes'. But suppose another person looks at the prototype and
asks himself "Can I pass std::cout as a second argument?",
"Given that std::cout is an std::ostream (as required by the
prototype)....." you get the idea. This kind of problem
shouldn't exist, otherwise why are we bothering with classes
and OO?

You've effectively raised a complicated issue here. One that
has no simple solution. With your prototype above, you'll have
to document a lot of restrictions concerning the output file, at
least -- e.g. it must be opened in binary mode if it is a
filebuf.

Right off the bat, I'd say that one thing is essential: the
ability to ask a file whether it is in binary mode or not. (But
this only makes sense for filebuf -- stringbuf, etc., don't have
this distinction.) Ideally, we should be able to change this on
the fly, but it's not possible under some OS's (e.g. IBM
mainframes, etc.).

I'd be glad to hear you opinion.

I think you're approaching the problem backwards. I think that
there is a serious problem with iostream, in that it tries to be
something for everyone. I think we've reached a point with
regards to IO that one size doesn't fit all, and I seriously
doubt that we can design a single class which will handle
network protocols, data base accesses, GUI displays and
keybords, and simple input and output streams, and still be
simple and conceptually elegant. I think, too, that there is a
serious lack of any requirements specification concerning what
is wanted in a replacement (or the replacements). IMHO, the
<iostream.h> by Jerry Schwarz did an exceptionally good job of
meeting its requirements specifications, such as they were
twenty years ago -- my only serious criticism of it, as it was
then, and for the requirements at that time, is the function
names in streambuf. Since then, handling for binary files on
systems where binary files and text files are different (or
where the text file representation isn't exactly conform with
the internal requirements of C++), and even more so, trying to
load internationalization, especially support for multiple code
sets, on its back, have not improved it. Trying to modify a
class to do something it wasn't initially designed for never
does.

I think that it is time to start thinking about a replacement.
As it stands, iostream is still, today, a pretty good solution
for mono-lingual streamed text input and output (what it was
designed for). But that's not all of our IO needs today, not by
far. But the place to start would be by defining the
requirements.

--
James Kanze GABI Software
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]