Re: iostream replacement

From:

"James K. Lowden" <jklowden@speakeasy.net>

Newsgroups:

comp.lang.c++.moderated

Date:

Thu, 31 Jan 2013 01:59:26 CST

Message-ID:

<20130131023137.41c80e14.jklowden@speakeasy.net>

On Mon, 28 Jan 2013 19:05:08 -0800 (PST)
fmatthew5876 <fmatthew5876@googlemail.com> wrote:

Other parts of this thread have been interesting because your "better
printf" idea has been divided into

1. iostream extensibility, and
2. formatting.

As you say, these are orthogonal. printf could be implemented using
variadic templates in C++11 today. If and when a better iostream is
defined, printf should keep on rolling.

I agree that it would be nice if iostreams were more amenable to
extension.

ISTM operator<< is needed for something like ostream_iterator.

But if 1000% overhead for formatted I/O versus binary is
acceptable, why would 1500% be unacceptable?

Its unacceptable if your application's run-time is bounded by parsing
file formats

I'm not buying it. Your application is bounded by parsing, and you've
identified operator>> as the culprit, and you're going for 30%
improvement?

Given that situation, it's time to write real parser, maybe using real
tools.

The reason to implement it as a variadic template is for type safety.

.....

All of this is possible in C++11.

Yes. You don't need to change iostreams if you really like the printf
lifestyle.

Their json_unpack() method takes a format string much like printf and
saves us from writing hundreds of lines of parsing spaghetti.

Eh, maybe. The examples are simple, and would be as simple with
operator>>, modulo our respective preferences. It doesn't scale very
well though: after 10 or so parameters, json_unpack is going to get
pretty unwieldy, not to mention error prone because you, Mr. Programmer
have to match up the format string with the variables.

From TFM:

"Testing for equality of two JSON values cannot, in general,
be achieved using the == operator."

Feh.

If you want a really big operation to be atomic you have to do the
formatting yourself and write it all out in one step.

Not at all. You just have to lock the resource and unlock it. Anyone
using threads is used to that.

Your argument in this area amounts to "printf is good for medium size
stuff". You admit that for one or two arguments operator<< is about
the same, and for large operations you'll have to make multiple calls
anyway. I'm just not that gob-smacked by the "works better for some
cases I think are common" argument.

Also, you seem to assume the user-defined type should have some kind of
asString() function, that it should make a string of itself and that
string is what should be printed. All in the name of atomicity, I
suppose. Sounds awful to me, and doesn't work for input at all.

Defining operator<< to write the type directly to the stream is more
efficient and less work, not to mention more tractable.

Still, with printf it makes it very easy to guarantee atomicity for
the majority of simple cases. Logging is a good example where an
atomic printf would shine.

So you say. If we have multiple threads writing to the same log, I
don't see much call for RAII; you might as well open the log once and
be done with it. (It's a shared resource, after all.) Once you're
there, you might as well use syslog, because you get nice
standardardized facilities and the ability to log across hosts. You
could still use variadic templates with it to gain typesafety.

I admit I once wrote a syslog wrapper to make it work more like
iostreams. It wasn't very satisfying. You want a logging statement to
be brief because it interrupts the main logic, and iostreams don't fit
that bill.

Operators might look nice on the surface but they obscure what the
actual code is doing when it comes time to do debugging and you have
to untangle the magic syntax to its bare elements.

There's nothing about using named functions that ensures clarity, and
nothing about overloaded operators that impedes it.

The idea that user-defined operators in and of themselves "obscure what
the actual code is doing" is utterly bogus. The *only* thing standing
between clarity and confusion is care, whether or not the function has
a name. Example:

    foo::operator void*() vs. foo::getStatus()

(with apologies to Rob Pike)

    if( !foo ) { ... }
    if( !foo.getStatus() ) { ... }

Monty, I'll take door #1.

Too much C++ code nowadays looks like a Russian novel written in Java.
User-defined operators permit the C++ programmer to achieve terse
expression by defining something of a DSL.

There's an urban legend that someone out there cleverly defined
operator- to send mail home to his mother or somesuch. Like Yeti:
maybe it exists, but there's no need to worry.

Acknowledged, debuggers are terrible with overloaded operators, as are
code browsers (at least those I've used). I blame that on *underuse*
of overloaded operators. If more people used them, they'd have better
support.

Not convinced? Count the variations on atol(3) in your libc.

scanf() at least tells you how many arguments were parsed. In most
cases you only need to know that all of them were parsed, in which
scanf works just fine.

I don't think I made my point clearly. I'm asserting that the very
existence of so many lately introduced variations of atol(3) is evidence
that sscanf is unused, given their overlapping functionality. By
contrast, no such thing has happened with iostreams.

The stream should not care about formatting. The stream should only
care about you feeding it bytes. The formatting should be done
externally.

There's nowhere else to keep formatting information for built-in
types. In your scenario, formatting choices are restated on every I/O.
I can't just say, once, that integers are in hex and floats are 7.2. I
have to say it every time.

The overhead is miniscule and you don't have to use it. (You have
binary I/O.) OK, you're carrying around an extra 8 bits or so, per
stream. That's a win because you get not to repeat yourself in your
code.

It may be rare for objects to support so many simultaneous formats,
but when I'm writing an IO operation I'd like to at least have some
clue from the code itself which one is being used.

Why? The operation is "put to stream". The whole idea is not to
care. Or do you not like overloaded operators generally, because you
prefer not to engage Koenig lookup?

asJson() or asXml() is a hell of a lot more descriptive then just
fout << my_object.

We already agreed IIRC that state belongs in the object, therefore that
formatting state belongs in the object. I'm beginning to see code like

    if( foo.mode() == foo::xml_mode )
        foo.asXml(cout);
    else
        foo.asJson(cout);

instead of

    cout << foo;

and my head doth spin.

How about indenting xml tags? Whats the solution there? More io
manipulators?

Isn't it well established that the solution to XML is not to use it?

Surely it's doubly well known that XML is a document exchange format,
and that formatting isn't necessary for one machine to parse another's
output?

Can it be that even today there's no nice XML viewer that will do the
trivial work of temporarily indenting a document for on-screen
viewing? Not counting emacs, that is?

If I should ever find myself wasting my and my employer's time
twice -- first by using XML and again by formatting it -- I would
consider a few things, some nontechnical. One technical option would
be to wrap ostream in a wrapper that understood XML well enough to
insert leading whitespace.

When I have wanted indented output of my own classes, I defined a
little tabs class in operator<< and tracked indentation something like

os << tabs(n) << data

= Extensibility =

When would it ever make sense to use operator<< on a socket?

query("select ...");
    dbstream db(...);
    db << query;
    while(db >> data) { ... }

This just looks like syntactic sugar to me. I don't see how using
stream operators in this example is enabling anything you couldn't do
with ordinary functions.

Quite so. Operators *are* ordinary functions. :-)

You suggested operator overloading is somehow not appropriate for use
with network I/O. I'm demonstrating to the contrary.

Inserting a row, btw, will be very familiar to you:

    dbstream& operator<<( dbstream& db, const foo& data ) {
        db << data.name << data.value << data.etc;
        return db;
    }

    db << data;

Updates are more complex because updates are more complex.

Networked IO is something that will surely be part of the standard
library in the future.

May we all live to see the day. Somehow C++11 passed on it.

Sockets are incredible complicated though, it will be no easy task.
The massive complexity of IO streams does not make it any easier.

I was reading Rob Pike again the other night, cf.
http://doc.cat-v.org/bell_labs/good_bad_ugly/slides.pdf

    #include <u.h>
    #include <libc.h>
    fd = dial(netmkaddr(argv[1], "tcp", "discard"), 0, 0, 0);
    if(fd < 0) sysfatal("can?t dial %s: %r", argv[1]);

Network I/O could be simplified. The same forces that never wrote
dial(3) didn't extend iostreams to the network.

--jkl

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]