Re: Is this String class properly implemented?

From:

James Kanze <james.kanze@gmail.com>

Newsgroups:

comp.lang.c++

Date:

Sun, 10 May 2009 04:50:10 -0700 (PDT)

Message-ID:

<7a6c9b29-6bb5-4311-8669-63b3b35693ae@x6g2000vbg.googlegroups.com>

On May 10, 2:28 am, "Tony" <t...@my.net> wrote:

James Kanze wrote:

On May 8, 3:02 am, "Tony" <t...@my.net> wrote:

James Kanze wrote:

On May 2, 12:10 pm, "Tony" <t...@my.net> wrote:

James Kanze wrote:

On Apr 29, 9:18 am, "Tony" <t...@my.net> wrote:

James Kanze wrote:

There is a huge volume of programs that can and do use no
text. However, I don't know of any program today that uses
text in ASCII;

You must be thinking of shrink-wrap-type user-interactive
programs rather than in-house development tools, for example.

No. None of the in house programs I've seen use ASCII, either.

text is used to communicate with human beings, and ASCII
isn't sufficient for that.

Millions of posts on USENET seem to contradict that statement.

In what way. The USENET doesn't require, or even encourage
ASCII. My postings are in either ISO 8859-1 or UTF-8, depending
on the machine I'm posting from. I couldn't post them in ASCII,
because they always contain accented characters.

I gave the example of development tools: parsers, etc.

Except that the examples are false. C/C++/Java and Ada
require Unicode.

To be general they do. One could easily eliminate that
requirement and still get much work done. I'm "arguing" not
against Unicode, but that the ASCII subset, in and of itself,
is useful.

It's certainly useful, in certain limited contexts. Until
you've seen a BOM or an encoding specification, for example, in
XML. (Although technically, it's not ASCII, but the common
subset of UTF-8 and the ISO 8859 encodings.)

Practically everything on the network is UTF-8. Basically,
except for some historical tools, ASCII is dead.

Nah, it's alive and well, even if you choose to call it a
subset of something else. Parse all of the non-binary group
posts and see how many non-ASCII characters come up (besides
your tagline!).

Just about every posting, in some groups I participate in.

I don't get what you mean: an ASCII text file is still an
ASCII text file no matter what font the user chooses in
Notepad, e.g.

First, there is no such thing as an ASCII text file.

Then what is a file that contains only ASCII printable
characters (throw in LF and HT for good measure)?

A file that doesn't exist on any of the machines I have access
to.

At the lowest level, a file is just a sequence of bytes (under
Unix or Windows, at least). At that level, text files don't
exist. It's up to the programs reading or writing the file to
interpret those bytes. And none of the programs I use interpret
them as ASCII.

For that matter, under Unix, there is no such thing as a
text file. A file is a sequence of bytes.

And if the file is opened in text mode?

It depends on the imbued locale. (Text mode or not.)

How those bytes are interpreted depends on the application.

So the distinction between text and binary mode is .... ?

Arbitrary. It depends on the system. Under Unix, there isn't
any. Under Windows, it's just the representation of '\n' in the
file. Under other OS's, it's usually a different file type in
the OS (and a file written in text mode can't be opened in
binary, and vice versa).

Internally, the program is still working with ASCII strings,
assuming English is the language (PURE English that recognizes
only 26 letters, that is).

Pure English has [...]

_I_ was giving the definition of "Pure English" in the context
(like a glossary). How many letters are there in the English
alphabet? How many?

The same as in French, German or Italian: 26. However, in all
four of these languages, you have cases where you need accents,
which are made by adding something to the representation of the
letter (and require a distinct encoding for the computer)---in
German, there is even a special case of =DF, which can't be made
by just adding an accent (but which still isn't a letter).

Surely I wasn't taught umlauts in gradeschool.

I was taught to spell na=EFve correctly (although I don't know if
it was grade school or high school).

You are arguing semantics and I'm arguing practicality: if I
can make a simplifying assumption, I'm gonna do it (and eval
that assumption given the task at hand)!

[...]

(Aside Trivia: The "failure" of Sun has been attributed in
part to the unwillingness to move to x86 while "the industry"
went there. Very ancient indeed!).

Where did you get that bullshit?

This week's trade rags (it's still around here, so if you want
the exact reference, just ask me). It makes sense too: Apple
moved off of PowerPC also probably to avoid doom. I'm a Wintel
developer exclusively right now also, so it makes double sense
to me.

Whatever? The fact remains that 1) Sun does produce processors
with Intel architecture---the choice is up to the customer, and
2) Sun and Apple address entirely different markets, so a
comparison isn't relevant. (The ability to run MS Office on a
desktop machine can be a killer criterion. The ability to run
it on a server is totally irrelevant.)

[...]

Do you think anyone would use MS Office or Open Office if they
only supported ASCII?

I was talking about simpler class of programs and libraries
even: say, a program's options file and the ini-file parser
(designated subset of 7-bit ASCII).

Apparently there is a semantic gap in our "debate". I'm not
sure where it is, but I think it may be in that you are
talking about what goes on behind the scenes in an OS, for
example, and I'm just using the simple ini-file parser using
some concoction called ASCIIString as the workhorse.

All of the ini-files I've see do allow accented characters.

Programs assign semantics to those ones and zeros.
Even at the hardware level---a float and an int may contain the
same number of bits, but the code uses different instructions
with them. Programs interpret the data.

Which brings us back to my point above---you don't generally
control how other programs are going to interpret the data you
write.

If you say so. But if I specify that ini-files are for my
program may contain only the designated subset of 7-bit ASCII,
and someone puts an invalid character in there, expect a nasty
error box popping up.

As long as you're the only user of your programs, that's fine.
Once you have other users, you have to take their desires into
consideration.

That would have been toward the end of the 1980s. And I
haven't seen a program in the last ten years which didn't
use symbols and have comments in either French or German.

But you're in/from France right? Us pesky "americans" huh. ;)

Sort of:-). My mother was American, and I was born and raised in
the United States. My father was German, my wife's Italian, and
I currently live in France (but I've also lived a lot in
Germany). And yes, I do use four languages on an almost daily
basis, so I'm somewhat sensitivized to the issue. But I find
that even when working in an English language context, I need
more than just ASCII. And I find that regardless of what I
need, the machines I use don't even offer ASCII as a choice.

I guarantee you that I'll never ever use/need 10646 comments
or symbols.

Fine. If you write a compiler, and you're the only person to
use it, you can do whatever you want. But there's no sense in
talking about it here, since it has no relevance in the real
world.

You're posting in extremism to promote generalism? Good
engineering includes exploiting simplifying assumptions (and
avoiding the hype, on the flip side). (You'd really put
non-ASCII characters in source code comments? Bizarre.)

I have to, because my comments where I work now have to be in
French, and French without accents is incomprehensible. The
need is less frequent in English, but it does occur.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34