Re: Will interest in C++ be revived after the Java fallout?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Fri, 28 Jan 2011 03:43:23 -0800 (PST)

Message-ID:

<aec0c985-fc9d-48d9-abd3-9ad8a3048e9b@g1g2000prb.googlegroups.com>

On Jan 27, 10:28 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:

On Jan 27, 1:30 am, James Kanze <james.ka...@gmail.com> wrote:

On Jan 26, 11:13 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:

On Jan 26, 12:29 pm, Rui Maciel <rui.mac...@gmail.com> wrote:

Regarding the Unicode bit, that is indeed a thorn on C++'s side, and in
C's too. Nonetheless, the new C and C++ standards already incorporates
C's unicode TR.

Links please? I'm curious how bad the support is.

I'm not familiar with the TR, but if I had to do serious text
handling in Unicode in C or C++ (or in Java), I'd probably use
ICU. (I can't be sure; I've never had to do serious text
handling in Unicode. But I've heard good things about it from
people who have used it. And I do know that the native support
in C, C++ and Java is insufficient.)

Indeed. My current company uses ICU handling for its server-side data
layer.

I'm reasonably familiar with Java, and while its interface is not
ideal, I find it much more usable and complete than C++'s standard
library in this regard.

I'm not sure. I don't find the Java stuff that easy to use, and
used correctly, C++'s locale stuff can be pretty complete. The
big difference, I suspect, is documentation: if there's one
place where Java beats C++ (and every other language I know)
hands down, it's documentation: it's very easy to find tutorial
trails and the API documentation for just about anything that
Java supports, where as I wouldn't even know where to point you
to for good explications of C++'s locale support.

The only minor point I know offhand is I don't know how portable in
practice the encodings of Java are. Howwever, at least Java requires
that the UTF encodings must be supported for a conforming
implementation.

The next version of C++ will require some support for UTF-8,
UTF-16 and UTF-32. But I don't think that requirement has
propagated into anything concrete in locale---I agree that that
is a bother.

In short, for any library purporting to supply Unicode "support", I /
want/ at least:
2- Actual good collation and normalization support.

The framework is present in C++, although whether a complete
implementation for Unicode is provided depends on the C++
implementation, and the framework is not particularly simple or
convenient to use.

Well, yes and no. When you collate Unicode strings, you don't do
lexicographic comparisons of the actual string data. Instead, you
convert the string according to some rather complex and locale
specific rules to a bit string, and then you do simple lexicographic
comparisons on the resulting bit string. You'd have to implement that
logic from basically the ground up in C++ with just the standard
library at your disposal AFAIK.

C++ supports lexicographic comparison in any locale it supports.
If the implementation supports Unicode locales (and from a QoI
point of view, it surely should), then you can compare two
strings, or convert them into something which will give correct
results using bitwise comparison, or get a hash of a string
which is compatible with the comparison. See the collate facet
in locale.

4- Not implemented by a retarded monkey. An example of an
implementation by a retarded monkey is one using a virtual function
call per encoding unit for translation between encoding.

But you can't get away from a virtual function call per string,
which is what C++'s ctype and codecvt require.

What support C++ provides is through the various locale facets,
and there isn't (at least at present) any requirement that
facets supporting Unicode are provided (although one would
expect it from a QoI point of view). The interface to facets is
a bit twisted, I'm afraid, but I've not seen anything simpler in
other languages.

I bring this point up because it's my understanding from my colleagues
that at least one version of ICU, the one my company uses, actually
does virtual function calls per code unit for some of its important
operations.

When I said my company uses ICU, I meant that my company uses a house
modified version of ICU which reworked the code to avoid the costly
virtual function calls per code point. My predecessors ran profilers,
which told them that an excessive portion of time was spent in those
virtual calls. It was a really big speedup to rework it as they did.

I'm not completely surprised. IIUC, ICU originated in the Java
world, where virtual function calls are the standard. At least
in most cases (the ones I've looked at), the C++ interface in
facets has two versions of the functions, one which takes
characters, and one which takes strings. (Regretfully, strings
represented by a pair of char const*---because the functions are
virtual, templates can't be used.)

--
James Kanze