Re: iostream replacement

From:

fmatthew5876 <fmatthew5876@googlemail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Mon, 28 Jan 2013 19:05:08 -0800 (PST)

Message-ID:

<319a38e5-8a4d-4fb4-9570-2fd5bc24d4f1@googlegroups.com>

On my system write(2) writes 10,000,000 floats in 0.7 seconds.
printf requires 6 seconds, and operator<< 9 seconds.

You'll have to show how you performed this test first. If you just did
cout << f;
or
printf("%f", f);

10,000,000 times then it doesn't invalidate my point about function calls
and locks.

the machine. Granted, sometimes that's not true. But if 1000% overhead
for formatted I/O versus binary is acceptable, why would 1500% be
unacceptable?

Its unacceptable if your application's run-time is bounded by parsing file
formats, which some happen to be. We do C++ for efficiency, it should be
one of the primary goals of any interface, even if its IO formatting.

On the third hand, if printf(3) is so doggone good, why not just use
it? Why re-implement it with variadic templates?

The problem with C style FILE* IO is first its not encapsulated within a class
so you cannot use RAII to automatically close your files unless you write
your own class wrapper or use something like scope guard. FILE* is not
an extendible interface that could support say sockets, memory buffers,
or other forms of IO.

The reason to implement it as a variadic template is for type safety. Yes
most modern compilers will type check printf for you, but thats an example
of the compiler including a specific hack for a commonly used interface. We
should be able to write our own interfaces that use format strings.

Check out this json parsing library:
http://www.digip.org/jansson/doc/2.4/apiref.html#parsing-and-validating-values

Their json_unpack() method takes a format string much like printf and saves us
from writing hundreds of lines of parsing spaghetti.

A template version of printf also doesn't need letter codes anymore.

Take a look at the D language version of formatted output:
http://dlang.org/phobos/std_format.html

You can basically say printf("%s %s %s %s", 1, 1.0, "foo", my_obj) and it
works. Why? Because with variadic templates the compiler already knows the
type. The D version gives the ability to specify something like "%4.5f" if
you want more specific formatting.

The "%s" notation even supports arbitrary types with a .toString() method,
giving you the same automatic flexibility you have with defining
custom operator<<().

All of this is possible in C++11.

In the first case its two atomic printf calls, the second its 5
with no guarantees that the output will be not be
interleaved.

Yes, but printf helps only to the extent it accepts more arguments. If
you use printf more than once to complete the I/O of a single object,
you're back in the same ditch.

Thats a general limitation with stdio and iostreams. If you want a really big
operation to be atomic you have to do the formatting yourself and write it
all out in one step.

Still, with printf it makes it very easy to guarantee atomicity for the
majority of simple cases. Logging is a good example where an atomic printf
would shine.

OK, so "abuse" is any use not immediately reflected in the language.
Ergo operations not defined by the language cannot use operators.

That definition precludes innovation.

"Innovative" uses of operator overloading lead to hard to understand and
unreadable code. Operators are nothing more than syntactic sugar. Operators
might look nice on the surface but they obscure what the actual code is doing
when it comes time to do debugging and you have to untangle the magic
syntax to its bare elements.

I have never once confused the use of operator<< for I/O versus
bit-shifting. I bet you haven't either. IIRC it was chosen over
operator< specifically to avoid confusion.

The only reason we don't confuse operator<< with bit shift is because its
use is ubiquitous in the standard library. If this was Bob's io library
and not the STL this use of overloading would have very few proponents.

Hmm, for me the opposite. I define my I/O operations and then just use
them.

If I'm working with a well known format such as XML then I'm using an
XML parsing library. If I'm working with a binary format then I'm writing
endian converted fixed size ints directory to the stream. I don't find
IO streams operators particularly helpful in either of these cases.

We've focussed on printf, but compare operator>> to fscanf. The error
reporting in fscanf is so scanty that the function goes unused except
when the input is known to have been produced by fprintf. istreams
are commonly used with input of indeterminate origin. Errors are
easier to handle because each input is a separate operation.

Not convinced? Count the variations on atol(3) in your libc.

I'm not convinced. atol() and friends are pretty horrible. You talk
about the failures of error checking in scanf(), but consider atol().
It just returns a long. You have to no way to know whether that was
the value you read from the string or there was a parsing error. I never
use these functions.

scanf() at least tells you how many arguments were parsed. In most cases you
only need to know that all of them were parsed, in which scanf works just fine.

ISTM your "hex" disagreement is with <iomanip>. If you think it aids
clarity, nothing prevents you from implementing a manipulator such as

cout << fmt("0x%02x") << ch;

What if I want to print 2 of them with a space in between followed by another
space and a float?

Do I do this?

cout << fmt("0x%02x ") << ch << fmt("0x%02x ") << cd << f;

or this?

cout << fmt("0x%02x") << ch << " " << fmt("0x%02x") << ch << " " << f;

or how about this?

cout << fmt("0x%02x 0x%02x %f") << ch << cd << f?

You can debate about which one is better, I hate them all.

Also in general I don't like the concept of IO manipulators. Now the
stream object has to keep track of all the manipulator states.

First theres the space overhead of the stream object if you aren't even using
manipulators. Second, how can you possibly create new ones when the set of
possible manipulators are coupled with the stream implementation? Some kind
of god awful dynamic polymorphism? No thanks.

The stream should not care about formatting. The stream should only care
about you feeding it bytes. The formatting should be done externally.

= One format =

User defined types don't need to provide a sequence. As I said

earlier the printed format varies greatly depending on context.

Vary perhaps, greatly no. I doubt many objects need a log, XML, json,
binary, and something else.

It may be rare for objects to support so many simultaneous formats, but
when I'm writing an IO operation I'd like to at least have some clue
from the code itself which one is being used.

Even if you're right, though, that's not a knock on iostreams. The
"context" you refer is in fact state, and that state must be captured
somewhere. The state-determined format must be executed somewhere, and
that somewhere surely suggests a function.

That is to say, somewhere you'll need e.g. asJson() and asXml(), and
somewhere you'll need outputMode (implicitly or explicitly). None of
that changes by switching to a printf-style function.

printf isnt meant for handling arbitrary file formats in one function call.
Its meant for simple string formatting and possibly also building a parser
for a larger format.

asJson() or asXml() is a hell of a lot more descriptive then just
fout << my_object. Also you might be embedding your object within some node
of an XML object. The operator<<() doesn't help you there either as you have
to make absolutely sure you printed all of the previous parts of the
document first.

How about indenting xml tags? Whats the solution there? More io manipulators?

Requiring user defined types define all of their printing formats
also creates a coupling between the object and all of its supported
io formats.

which is good ...

Well I admit this is actually kind of a hard problem. Imagine I want to add a
plugin to an application to add JSON file support. It means I need to extend
the class types of the application because my JSON file parser may need to
modify their private data members as it is constructing the objects.

I don't know the right answer to this problem. If anyone does I'd love to hear it.

If I want to use your class and serialize it in a format
you didn't originally support I'm already stuck rolling my own

serialization code using printf or other string conversion functions.

which is bad, which is why you say "stuck". Which is why it's better
if you can extend a class's functionality by extending the class.

If external representation format isn't a property of the object, what
*is* it a property of? I guess you'd say "of the stream" because the

It's a property of the object that may need to be extended. It most surely is
not a property of the stream.

only other choice is "global state". I'm not sure there's an objective
answer, but I am sure that extending the stream is harder than
extending one's own class, simply because of the inherent complexity of
I/O.

= Extensibility =

Suppose we do get a std networking library that uses the iostream
interface. When would it ever make sense to use operator<< on a
socket?

The stream metaphor and operator<< can in fact be used quite easily
with network I/O.

My dbstreams library uses operator<< to send a query to a DBMS:

    query("select ...");
    dbstream db(...);
    db << query;
    while(db >> data) { ... }

This just looks like syntactic sugar to me. I don't see how using stream
operators in this example is enabling anything you couldn't do with ordinary
functions.

To be fair, that's just syntax. A dbstream is not an iostream, in part
because there's no provision in iostreams for timeouts. Whereas in
sendto(2) the timeout is an argument to the function, in the dbstreams
library the timeout is a property of a stream. A timeout sets failbit,
analogous to that of an iostream. A dropped connection sets badbit.

The syntax is easy enough. The metaphor holds up. The iostream
definition is only limited, not contradicted.

As a mere user of the iostream library, I can't add timeouts to it.
But that's a only limitation of the library as currently defined. I
see no reason it couldn't be extended to handle timeouts etc.,
nor any disadvantage to doing so.

Another limitation is you can't control locking. glibc has functions like
fread_unlocked() which are great if you have a file you know you'll only
access from one thread.

Rather, the iostream library is underappreciated as a model, and I/O is
hard enough that no one is willing to add networked I/O to it. That is
our loss, though, not the fault of iostreams.

Networked IO is something that will surely be part of the standard library in
the future. Sockets are incredible complicated though, it will be no easy task.
The massive complexity of IO streams does not make it any easier.

If streams were just a basic low level interface, adding socket support would
be much more feasible. IOstreams for example have all this cruft for dealing
with locales. Since when do you care about the locale on your local machine
when sending a message across the network?

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]