Re: delimiter for istringstream

From:

"James Kanze" <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++.moderated

Date:

31 Dec 2006 04:36:33 -0500

Message-ID:

<1167526859.765438.20980@a3g2000cwd.googlegroups.com>

Denise Kleingeist wrote:

James Kanze wrote:

Denise Kleingeist wrote:

    int_type underflow() {
      if (this->gptr() == this->egptr()) {
        std::streamsize size = this->myStream->sgetn(this->myBuffer,
1024);
        this->setg(this->myBuffer, this->myBuffer,
                       std::transform(this->myBuffer + 0,
this->myBuffer + size, ':', ' '));

Now that's clever,

Yes, I think so, too :-)

hiding the transformation in an argument to
setg, so that the reader won't see it unless he carefully
analyses each step in detail.

Well, the transformation is on a line for its own! Yes, it is still the
argument
of a function but then, this is typical use in functional programming

I know, and if the context was one of functional programming, I
wouldn't have said anything. I actually often write C++
functions using a functional style, with only a single return
statement in the function. Had this been the case here, I might
even use the return value of std::transform myself. (Weighing
against this is the fact that many programmers don't realize
that it has a return value. Like me, for example, until I saw
your code.)

and with
template meta programming I'm quite used to this style of programming.

Certainly. In template meta programming, it's all that is
available. And even in non template meta programming, it has a
lot to recommend it. But the example here doesn't use a
functional programming style---both transform and setg are
called not for their values, but for their side effects. And
generally, a single statement should have a single effect:
change a single variable, etc.

Note too that function chaining is definitly NOT in the STL
style; the STL goes out of its way to make it difficult and
unnatural.

There is another things in template programming which makes function
chaining a good idiom: often, the return type from a function is not at
all
easy to spell and sometimes it is actually outright impossible!

Agreed, and in such cases, you have to weigh the obfuscation of
multiple side effects against the added complexity of specifying
a complicated and non-intuitive return type. Depending on the
case, the balance may shift one way or the other. But it's
never a case of saying: this is good. It's simply choosing the
lesser of two evils.

In this case, the return type is char*, so the argument really
doesn't hold.

[...]

Not having to spell out types tends to make
implementations more stable against changes and generalization. Sure,
neither is the case or the need here (although the filter can
reasonably
easy extended to cope with arbitrary character types) but general
practice in template programming.

It's a necessary evil in some forms of template programming.
Such template programs are not noted for the readability,
however, at least not by anyone I've talked to. Most of the
time, admiration is reserved for the fact that it could be done
at all, and not for the clarity of the way it was done.

Don't make a virtue out of a necessary evil. Also: C++ is a
multiparadigm language. Don't apply standard idioms from one
paradigm in a different paradigm; it will only create confusion.

In general, unless there is a definite reason for doing
otherwise, I tend to avoid bufferisation in a filtering
streambuf.

Wow! Your background must be rather different than mine! I typically
need to get at least decent performance out of stream buffers and
unbuffered stream buffers are even slow with segmentation unaware
library implementations but really unacceptable compared to algorithms
taking segmentation into account.

I've never found the lack of bufferization in a *filtering*
streambuf to make a measurable difference. You definitly want
buffering at the lowest level, before going to the system, but
an extra virtual function call or two per character typically
isn't enough above the level of noise to be measurable.

Chopping even inner loops into tiny
basic blocks by having a virtual call for each character makes
unbuffered
stream buffer rather expensive. However, I'm admittedly sticking with
one
filter once it is in place and normally don't use stream buffers at
different
levels: in this case, buffering indeed becomes a problem. However,
since
I generally need the performance I tend to make my stream buffers
buffered (seems to be an oxymoron anyway to have unbuffered [stream]
buffers).

The name is suggestive:-). But then, what name in streambuf's
is well choosen---sgetc leaves the character in the sequence,
for example. And using separate buffering a stringbuf will slow
things down.

If I ever ran into a performance problem, I wouldn't hesitate to
buffer. But it seems very much like pre-mature optimization.
And it does restrict flexibility. One typical trick: a comment
stripping filtering streambuf which doesn't know about quoted
strings. When you encounter the quote character, you go behind
the filtering streambuf, and read from the original source until
the end of the quote. Other times, I'll just filter subsets of
a stream (or add in an additional filter for a subset). In the
initial case here, what is to prevent the block of data from
being just part of a larger stream---in such a case, the obvious
solution is a filtering streambuf which also knows how to detect
the end of the block, and returns EOF when it does. He can then
use an istream_iterator to read just the relevent part of the
file, and resume reading the rest after the istream_iterator has
seen EOF.

(Obvoiusly, both techiques fall into the category of "clever",
and would definitely require heavy commenting:-). But depending
on the application, I've found both very useful at times.)

In fact:

And of course, buffering is also
more complicated if the transformation is not one to one, which
is often the case (although not the case here).

Dealing with n-to-m transformations with n != 1 || m != 1 for
unbuffered
stream buffers is definitely a pain and having a buffer around
definitely
makes handling these transformations easier!

I'm not sure I see how. If you'll look at my web site
(http://kanze.james.neuf.fr/code-en.html---you'll have to browse
into the code, in Util/IO/FilteringInputStream/examples/gb and
Util/IO/FilteringOutputStream/examples/gb), there are a number
of filters with n-to-m mappings, all of which are very simple,
and none of which use a buffer. More recently, I have had to
implement a UTF-8/ISO 8859-n mapping in a streambuf. In this
case, performance considerations do suggest buffering, and while
the buffer won't be installed until the first couple of lines
have been read, once installed, it won't be removed. And I
found supporting the buffering added complexity. (Lucky I don't
need to support seeking!)

Again, needing the
current position for whatever reason (e.g. because the underlying
stream
buffer is used independently or because seeking is used) makes the
buffering somewhat harder to use. In fact, in many n-to-m
transformations
seeking or independent use of the underlying stream buffer may be
impossible anyway...

Quite. My filtering streambuf's don't generally support
seeking. Of course, my filtering streambuf's are generally
designed for use with text streams, and seeking in a text stream
isn't that easy to begin with. (Isn't "seeking" in a "stream" a
bit of an oxymoron, anyway:-)?)

--
James Kanze (Gabi Software) email: james.kanze@gmail.com
Conseils en informatique orient?e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]