Re: Is this String class properly implemented?
On May 2, 12:10 pm, "Tony" <t...@my.net> wrote:
James Kanze wrote:
On Apr 29, 9:18 am, "Tony" <t...@my.net> wrote:
James Kanze wrote:
7-bit ASCII is your friend. OK, not *your* friend maybe,
but mine for sure!
7-bit ASCII is dead, as far as I can tell. Certainly none
of the machines I use use it.
It's an application-specific thing, not a machine-specific
thing.
That's true to a point---an application can even use EBCDIC,
internally, on any of these machines. In practice, however,
anything that leaves the program (files, printer output, screen
output) will be interpreted by other programs, and an
application will only be usable if it conforms to what these
programs expect.
Which isn't necessarily a trivial requirement. When I spoke of
the encodings used on my machines, I was refering very precisely
to those machines, when I'm logged into them, with the
environment I set up. Neither pure ASCII nor EBCDIC are
options, but there are a lot of other possibilities. Screen
output depends on the font being used (which as far as I know
can't be determined directly by a command line program running
in an xterm), printer output depends on what is installed and
configured on the printer (or in some cases, the spooling
system), and file output depends on the program which later
reads the file---which may differ depending on the program, and
what they do with the data. (A lot of programs in the Unix
world will use $LC_CTYPE to determine the encoding---which means
that if you and I read the same file, using the same program, we
may end up with different results.)
My (very ancient) Sparcs use ISO
8859-1, my Linux boxes UTF-8, and Windows UTF-16LE.
The reason is simple, of course: 7-bit ASCII (nor ISO 8859-1,
for that matter) doesn't suffice for any known language.
Um, how about the C++ programming language!
C++ accepts ISO/IEC 10646 in comments, string and character
literals, and symbol names. It allows the implementation to do
more or less what it wants with the input encoding, as long as
it interprets universal character names correctly. (How a good
implementation should determine the input encoding is still an
open question, IMHO. All of the scanning tools I write use
UTF-8 internally, and I have transcoding filebuf's which convert
any of the ISO 8859-n, UTF-16 (BE or LE) or UTF-32 (BE or LE)
into UTF-8. On the other hand, all of my tools depend on the
client code telling them which encoding to use; I have some code
floating around somewhere which supports "intelligent guessing",
but it's not really integrated into the rest.)
Of course, I'm talking here about real programs, designed to
be used in production environments. If your goal is just a
Sudoku solver, then 7-bit ASCII is fine.
Of course compilers and other software development tools are
just toys. The English alphabet has 26 characters. No more, no
less.
C, C++, Java and Ada all accept the Unicode character set, in
one form or another. (Ada, and maybe Java, limit it to the
first BMP.) I would think that this is pretty much the case for
any modern programming language.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34