Re: Linux programming, is there any C++?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Thu, 21 Feb 2008 03:14:16 -0800 (PST)

Message-ID:

<01ff81a1-06d5-4000-bb48-a9ae4dad8978@72g2000hsu.googlegroups.com>

On Feb 20, 11:36 pm, Jeff Schwab <j...@schwabcenter.com> wrote:

James Kanze wrote:

You might want to take a look at OSE
(http://ose.sourceforge.net). IMHO, a lot better designed and
easier to use than the STL. Above all, a different approach.
It tend to use STL mainly as the low level tools, over which I
build the library I actually use. OSE is usable directly.

Seems like it has special support for Python. Speaking of stuff that's
"in the air," it sure seems like Python is rapidly becoming the de facto
standard scripting/dynamic language for interfacing to programs written
in C++. Now I just have to convince the clients that they don't really
want all that legacy code they've written in a half dozen other
scripting languages, and that it's time to learn yet another...

I've heard a lot of good things about Python. On the other
hand, I learned scripting back before even perl existed. Since
scripting is not an important enough part of my activity to
justify effort to continuously learn new things, and what I know
suffices for what I do, I still use mainly grep, awk and sed.

(Even before templates were added to the language, people
were simulating them with macros.)

I don't know about "most people," but there was a relatively advanced
technice that I have used in C called XInclude:

     #define ELEM_T int
     # include "list.h"
     #endif

s/endif/undef ELEM_T/ (my bad)

It's a far cry from what C++ templates give you. Googling
XInclude just turns up something related to XML-related
processing. Googling XInclude -XML also fails to turn up
the XInclude pattern. It's still not an especially
well-known practice.

Try googling for <generic.h>:-).

Lots of <generic.h>s, but none that look like precursors to templates.
I was thinking of headers that defined a bunch of type-specific
structures and functions by being included multiple times, each time
with a set of macros representing a different static type. They mention
it briefly here:

http://en.wikipedia.org/wiki/C_preprocessor#Token_Concatenation

Is it that long ago, that no one still has explinations of how
to use it. Basically, <generic.h> (part of the standard library
which came with CFront) defined macros which allowed you to
write things like:

    #define MyClassdeclare(T) \
       ...
    #define MyClassdefine(T) \
       ...

The user then wrote:
    declare( MyClass, T )
and got the declaration for MyClass for type T, and
    define( MyClass, T )
to get the implementation. (It may have been implement, rather
than define. It's been awhile.)

If I were going to write something with a GUI, I'd probably give
wxWidgets a trial. On the other hand, GUI's are something that
Java actually does quite well. (More because Swing is well
designed, that because of anything in the language itself.)

I like Swing, too, although the handful of GUI experts I know
still seem wary of it. I haven't used wxWidgets, but I hear
mixed reviews.

I've only taken a quick glance, and didn't particularly like
what I saw, but it wasn't enough to fairly judge. The fact
remains that in practice, it and Qt are the only widely used
libraries, and Qt requires a pre-processor.

The only other "iterators" I was using at the time were
hateful little C-style things that were intended to work like
this:

some_lib_iter* iter = some_lib_create_iter(some_lib_some_contain=

er);

     while (!some_lib_iter_done(iter)) {
         some_item* item = (some_item)some_lib_iter_next(iter);
        // ...
     }

By the way, I'm currently using a recently written,
professional, industry-specific C++ library that supports
almost the same idiom, and I still don't like it.

It's very close to the USL idiom:-). And the Java one. And
yes, combining advancing and accessing in a single function is
NOT a good idea.

Do you use istream_iterator?

At times. Most of the time, however, my input requires somewhat
more complex parsing than you can get from an istream_iterator.

[...]

So how to you write a function which returns a range,

I don't think I've ever needed to.

It would seem to occur naturally fairly often as a result of
functional decomposition. I was using the GoF iterator pattern
long before I'd heard of the STL, with filtering iterators and
functions returning custom iterators as part of the package.

If I did, I'd probably follow the STL approach of returning a
std::pair (like std::equal_range).

Which can't be used directly as an argument for the next
function, so you can't chain.

and use the return value of that function as the argument to
a function which takes a range? Or how do you use the
decorator pattern on an iterator, to provide a filtering
iterator?

That, I've done, and with some success. I didn't come across
any particular problems (or if I did, they're so subtle I
still don't see them). You have the outer, decorating
iterator, and the inner iterator whose type is a template
parameter. Intercept all increment/dereference/etc. calls,
and provide whatever delegation and decoration are necessary.
No fuss, no muss. Clean, simple client code.

Except that the incrementation operator will typically want to
increment the decorated iterator more than once, and needs to
know the end, to avoid real problems.

Try writing an iterator which will iterate over the odd values
in a container of it, for example. Or one that will iterator
over the values outside a given range in a container of double.
In general, a filtering iterator must contain both the current
and the end iterators of what it's iterating over.

I guess you're not a big fan of STL-style iterators, but I
still love them.

I guess if I'd never known anything else, they wouldn't seem so
bad.

[...]

Seriously, the problem with std::string is that it is sort of a
bastard---it's too close to an STL container of charT to be an
effective abstraction of text, and it adds a bit too much which
is text oriented to be truly an STL container.

I don't see those as contradictory goals. Any representation
of text is effectively a container of characters.

And there is no basic type in C++ which represents a character.

Text is hard. Very hard, since it was designed by and for
humans, not machines. And humans are a lot more flexible than
machines. (There's also the fact that text is in two
dimensions, rather than one, and that it is graphical. I'm not
sure to what degree a string class should take that into
account, however.)

In its defense: even today, I'm not sure what a good abstraction
of text should support.

Right, there still does not seem to be any widespread agreement on that.
It's probably a good idea to keep the C++ standard string class
interface minimal, until C++ developers know what they really want.

Agreed. My real complaint about std::string is that it is too
heavy, not that it is missing features. I'd rather see it as
"just" an STL container. But then, what separates it from
std::vector<char>? Suppose we provide an overloaded operator+=,
operator+ and a replace function for vector, and all of the rest
of the functionality as external functions. (If I consider my
pre-standard string class, only two functions---other than
construction, assignment and destruction, of course---weren't
implemented in terms of other functions. Everything I did with
the string was defined in terms of replace or extract.)

Case-insensitive compare is covered in plenty of
introductory C++ texts, because it's one of the easiest
things to show people.

Case insensitive compare is one of the most difficult
problems I know. Just specifying it is extremely difficult.

Depends what you mean by it. What most new C++ developers
mean by it is a pretty simple idea, and a FAQ. Plenty of
string classes that have alleged case-insensitive comparison
functions actually provide only the toupper-each-char
implementation.

The problem is that toupper-each-char isn't implementable. At
least for any usable definition of toupper. What's toupper('=DF')
supposed to return?

The real problem with case insensitive comparison, of course, is
that it isn't defined. You can't write a function to implement
it, because you don't know what that function really should do.
(And of course, what it should do depends on the locale. In
France, '=E4' compares equal to 'A', in Germany, it should collate
as "AE". Except, of course, that in France, it would compare
greater than 'A' if the two strings were otherwise equal. And
in Germany, there are actually several different standards for
ordering.)

If you're talking about an industrial-strength, portable
implementation, then of course it gets complicated, as do all
natural-language related issues.

As you say: natural-language related issues. That's the
problem.

If you have a copy of Effective STL handy: The simple case is
covered by Item 35, and the complicated case is Appendix A,
which is the Matt Austern article from the May 2000 C++
Report.

http://lafstern.org/matt/col2_new.pdf

This article is getting a little long in the tooth; has
anything really changed? The only new info I've seen is
library-specific documentation (ICU and Qt).

Well, Matt does seem to ignore the fact that toupper and tolower
not only aren't bijections, but they aren't one to one. As I
said, in German, tolower( '=DF' ) must return a two character
sequence. It also ignores the fact that many characters require
two units to be represented---even in char32_t (32 bit Unicode).
And that frequently, a single character will have several
possible representations, using different numbers of units: in
Unicode, "\u00D4" and "\u006F\u0302" must compare equal. (Both
represent a capital O with a circumflex accent.)

[...]

(Note that you're certainly
not alone in this. The toupper and tolower functions in C and
in C++ all suppose a one to one mapping, which doesn't
correspond to the real world, and every time I integrated my
pre-standard string class into a project, I had to add a
non-const []---although the class supported an lvalue substring
replace:
s.substring( 3, 5 ) = "abcd" ;
was the equivalent of
s = s.replace( 3, 5, "abcd" ) ;
.)

Whether toupper and tolower are correct is a completely
orthogonal issue to whether it makes sense for the string
class to have array-style character indexing.

The question is: when could you use a non-const [] on a string,
if even for case conversions, it's wrong? Is there ever a case
where you can guarantee that replacing a single char with
another single char is correct. (There may be a few, e.g.
replacing the characters in a password---required to be US
ASCII---with '*'s. But they're very few.)

(And of course, the [] operator of std::string gives you
access to the underlying bytes, not the characters.)

But that makes sense for that particular abstraction, because
std::string is a typedef meant to represent the common case of
characters that fit within bytes.

It's such a common case that it doesn't exist in the real world.

If the idea of a character is too complex to be represented by
a char or wchar_t, then it merits its own, dedicated type,
with support for conversions, normalization, etc.

You said it above: it's a natural-language related issue. Thus,
by definition, extremely difficult and complicated.

That's a sometimes-true but fundamentally misleading
statement. If you have a character type that serves better
than char or wchar_t, you're free to instantiate basic_string
with it, specialize char_traits for it, and generally define
your own character type.

Are you kidding. Have you ever tried this?

Yes, and it seemed to work well. It never got released in
production code though, because there just wasn't any need for
it.

You mean you redefined everything necessary, all of the facets
in locale, etc., and everything necessary for iostream to work?

But that's not the problem. I usually use UTF-8, which fits
nicely in a char. But [] won't return a character.

What do you mean? std::basic_string::operator[] returns a
reference-to-character, as defined by the character and traits types
with which basic_string was instantiated.

No. basic_string::operator[] returns a reference to charT.
(With the requirement that charT be either char, wchar_t or a
user defined POD type.) A character is something more
complicated than that.

The lack of a real Unicode character type in the standard
library is a valid weakness, but not a fundamental limitation
of the std::basic_string.

Even char32_t will sometimes require two char32_t for a single
character: say a q with a hacek.

I'll take your word for that example. :) Characters just
aren't all the same size anymore.

http://www.joelonsoftware.com/articles/Unicode.html

(Someone else who's only scratched the surface of the
problem:-). You might want to look at the technical reports at
the Unicode site, or get Haralambous' book.)

And by the way, I was relating my own experience. At the time
I first used std::string, the characters I needed to represent
fit very comfortably into bytes, and the [] operator did
provide correct access to them.

Take a look at my .sig. I should be obvious that this is not
the case for me.

Your sig looks fine to me, accented characters and all. It's
actually a nice proof of concept, since it includes three
different (Western) languages.

Except that in Unicode, some of the characters in it have
several different representations, some of which require a
sequence of code points. (I actually refered to it simply as an
indication that I do have to deal with multiple languages and
non-ASCII characters, on a daily basis.)

But even in English, if you're dealing with text, how often do
you replace a single letter, rather than a word?

Admittedly, not often. It's just not something that comes up
a lot. If I'm accessing an individual character, chances are
good that I'm actually iterating over the characters in a
string. This kind of code is usually just buried in low-level
library functions. If a library is going to support strings
and substrings, then some code somewhere has to work at this
level. There's no getting around it.

Even if the standard library provided lots of Unicode-friendly
string support, indexed character access would still be
important.

Note that I'm not against it for read-only access. You often
have to scan, code point by code point, to find something. But
it's almost always a mistake to replace single code points,
without the provision for changing the number of code points.

[...]

The more you learn, the more C++ rewards you. I remember
someone I used to work with, who had a morbid fear of C++,
taking one look at a typical C++ reference book and laughing
derisively (yes, derisively, just like an arrogant Bond
villain). "How do they expect anybody to learn all that?" he
asked. The answer is that you don't have to learn it all
before you can use it.

But there's no real point in using it otherwise.

Huh? Do you really think you know every nook and cranny of
the standard off the top of your head, including the standard
libraries?

Not every nook and cranny. But I do expect anyone using C++ to
have at least an awareness of what it can do.

In my experience, most C++ developers have no idea what the
language can do. They use it as a sort of "C with classes,"
replacing function-pointers with virtual functions, but
otherwise writing glorified C code.

In which case, they'd probably be better off in Java.

I've encountered developers like that, but I've also worked in
shops that insisted on quality code.

My point is just that if your goal is to just learn a
minimum, and start hacking code, C++ probably isn't the
language for you.

Oh, I think it is. Suppose you start with <insert
language-of-the-month here>. "Wow," you say, "this is really
neat! LotM lets me print 'hello world' with just a single
line!" Or (this one is in vogue now): "Look how much stuff I
can do with Excel macros! I'm going to implement all my
business logic using them. Instead of writing applications,
I'll give everybody macro-heavy spreadsheets to fill in."

Sooner or later, that person needs to write a real,
non-trivial program, at which point the knowledge they gleaned
from "Learn Language X in 24 Seconds" becomes worse than
useless. It becomes baggage. Writing very small programs in
C++ is harder than writing them in some other languages, but
the point of newbie hacking isn't just to get something
working, but to lay the groundwork for harder tasks that lie
ahead.

The problem is that C++ has enough gotcha's that code written
without some basic undertstanding will contain subtle errors.

Note that my personal opinion is that programming is a complex
profession, that you can't learn in a week or two.
Independently of the language. I don't consider the effort
needed to learn the "necessary minimum" in C++ excessive.
Although it's probably more than is needed for the necessary
minimum in Java (for example), the fact is that in both cases,
it's only a small percentage of everything you need to know in
order to write correct programs.

[...]

I would have liked to see a more Smalltalk-heavy industry.
All modern dynamic languages seem to me like convoluted
imitations of Smalltalk. I'm not a Smalltalk expert, and it
doesn't seem have much of a fan-base anymore (like the Lisp
cult), but the syntax was so clean, and you could port it to a
new bare-hardware platform in a Summer. What happened? Was
it the licensing? Why is Java the server-side "safe bet,"
rather than Smalltalk?

Smalltalk got a bad reputation for performance. And of course,
static type checking (a la C++ or Java) does improve program
reliability, by couple of orders of magnitude.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34