Re: Is this String class properly implemented?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Tue, 19 May 2009 03:10:46 -0700 (PDT)

Message-ID:

<aa543561-df5e-4936-a6c4-96872eb300e8@m24g2000vbp.googlegroups.com>

On May 19, 7:10 am, "Tony" <t...@my.net> wrote:

James Kanze wrote:

On May 15, 6:40 am, "Tony" <t...@my.net> wrote:

James Kanze wrote:

On May 10, 2:28 am, "Tony" <t...@my.net> wrote:

James Kanze wrote:

On May 8, 3:02 am, "Tony" <t...@my.net> wrote:

James Kanze wrote:

On May 2, 12:10 pm, "Tony" <t...@my.net> wrote:

James Kanze wrote:

On Apr 29, 9:18 am, "Tony" <t...@my.net> wrote:

James Kanze wrote:

Millions of posts on USENET seem to contradict that statement.

In what way. The USENET doesn't require, or even encourage
ASCII.

But the underlying protocol is NNTP, and while I don't know
for sure, I have an incling that it is still a 7-bit protocol
(?). But that wasn't my point. I was suggesting that most
USENET posts in threaded discussion groups are ASCII (by
nature of the characters in use by the posts).

And I'm simply pointing out that that is false.

I don't believe you.

It's easy enough to verify. I often have problems with postings
because they contain characters which aren't present in ISO
8859-1 (which are the only encodings for which fonts are
installed on my machines at work).

[...]

(At home, I use UTF-8, and everything works.) Which doesn't
have things like opening and closing quotes.

I agree: you foreignors are messing things up. ;)

Opening and closing quotes are part of English. At least, part
of the English used by people who've gotten beyond kindergarden.

My postings are in either ISO 8859-1 or UTF-8, depending
on the machine I'm posting from.

You can call it what you want, but if it contains only ASCII
characters, then I consider it an ASCII post.

But that's never the case for mine.

You mean your tagline?

I don't have a "tagline". In fact, I don't know what you mean
by a "tagline". My .sig uses accented characters, because it
contains my address. I'll also occasionally use characters
outside of the 96 basic characters in the body of my postings:
things like a section reference (=A7) when quoting the standard,
for example, or a non-breaking space.

If I had UTF-8 everywhere, I'd also quote correctly.

[...]

I'm not sure what you mean by "it's not English".

It's not English because English has only 26 letters, without
diacritics.

So the Merriam Webster Dictionary is not English (since it
contains diacritics on some words, and uses opening and closing
quotes, and a lot of other characters other than the 26
letters).

"Na=EFve" is a perfectly good English word.

The naturalized word 'naive' has been accepted into the
English language but the way you encoded it is still a foreign
word.

Not according to Merriam Webster. But of course, you know more
about English than the standard dictionaries.

And English uses quotes and dashes (which aren't available
even in ISO 8859-1)

You mean like dash as a separate character from minus?

A minus sign, a hyphen, an n-dash and an m-dash are four
separate characters. Because I don't have the dashes in ISO
8859-1, I simulate them with -- and ---, but it's really a hack.

and other various symbols like =A7 not available in ASCII in
its punctuation.

Symbols are not word elements. The code page concept is
symbols.

Nor are blanks. Are you saying that the encoding shouldn't
support blanks either?

Not to mention that a lot of groups handle mathematical
topics, and mathematics uses a lot of special symbols.

Separate code pages.

What the hell is a "code page"?

Claiming that unnaturalized words are rationale for
"Unicode everywhere" is ludicrous (for lack of a better
word that escapes my mind right now).

It has nothing to do with unnaturalized words (and I don't
see where "na=EFve" is unnaturalized). It has to do with
recognizing reality.

Reality is that 'naive' is a naturalized English word and your
encoding is a foreign word:

Not according to any of the dictionaries I've consulted. All
give "na=EFve" as a perfectly correct, native American English
spelling.

My point was made just above. No need to drag locales into
the discussion. (My "locale" speaks English as the only
language (which has only 26 letters, BTW)).

And what does the number of letters have to do with it?

Everything: I program in a spoken language and a programming
language. I chose my targets or at least know them: that is
the context of the software development.

The context of software development is that each programming
language defines a set of characters it accepts. Fortran used
the least, I believe---it was designed so that you could get six
6 bit characters in a word. C and C++ require close to a
million.

French also has only 26 letters.

That's misleading: French has diacritics, English does not.

Your talk about letters is what is misleading. I'm just
pointing out that it's irrelevant.

[...]

'naive' has been naturalized into the English language and
does not have/does not require (unless one feels romantic?)
an accent. You were taught French, not English.

Merriam-Webster disagrees with you.

Ah! I mentioned Webster long ago in this thread and discounted
any relavence:

Merriam-Webster is irrelevant to what is correct American
English use?

[---]
If you don't know English well, that's your problem.

You mean if I don't want to accept bastardization/perversion
it's my problem.

I mean that if you don't want to accept generally accepted,
standard usage, it's your problem. A serious one, at that,
symptomatic of a serious social maladjustment.

[...]

I have to, because my comments where I work now have to be in
French, and French without accents is incomprehensible. The
need is less frequent in English, but it does occur.

Simplify your life: use English (for SW dev at least)!

If you've ever tried to understand English written by a
non-native speaker, you'll realize that it's much simpler to let
them use French (or German, when I worked there).

Exceptional case.

Native English speakers represent less than 5% of the world's
population, which means that being a native English speaker is
the exceptional case.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34