Re: Is Chinese C++ SourceCode restricted to ASCII?

From:

Francesco <entuland@gmail.com>

Newsgroups:

comp.lang.c++,microsoft.public.vc.mfc

Date:

Sat, 5 Sep 2009 15:16:43 -0700 (PDT)

Message-ID:

<e91468ab-724c-4af3-b543-e5954d35d176@e8g2000yqo.googlegroups.com>

On 5 Set, 23:36, James Kanze <james.ka...@gmail.com> wrote:

On Sep 5, 8:47 pm, Francesco <entul...@gmail.com> wrote:

On 5 Set, 19:25, "Peter Olcott" <NoS...@SeeScreen.com> wrote:

Okay so C++ Identifier are limited to ASCII

No, they aren't. The C++ standard requires implementations to
support _at least_ a subset of ASCII.

No. It requires implementations to support some encoding for
characters in the basic character set. EBCDIC is just as valid
as ASCII.

With regards to identifiers, the standard *requires* an
implementation to support all Unicode characters classified as
alphanumeric. If the desired character isn't available in the
input encoding, then it can be specified by means of a universal
character name.

After that point, the implementations are free to give the
opportunity to write the source code in any charset
whatsoever.
This is a snippet of an old version of the standard:
-------
2.1 Phases of translation [lex.phases]
1 The precedence among the syntax rules of translation is specified
by
  the following phases.1)

    1 Physical source file characters are mapped, in an
implementation-
      defined manner, to the source character set (introducing new-
line
      characters for end-of-line indicators) if necessary.
Trigraph
      sequences (_lex.trigraph_) are replaced by corresponding
single-
      character internal representations. Any source file character
not
      in the basic source character set (_lex.charset_) is replaced
by
      the universal-character-name that designates that character.
-------
The current normative wording may vary but the essence should
be the same: your implementation is free to allow you writing
the source code in the charset of your preference.

Or the character encoding of its preference:-).

In practice, in this case, Java copied exactly what C++ (and
later C90) required, with the difference that the first Java
compiler actually implemented it, whereas even today, very few
C++ compilers do.

Thank you for refining my reply, James. With my limited knowledge,
I've done my best to recover the misrepresentation of C++ which was
being done here.

I take the chance to add that the standard imposes no limit to
identifiers' length.

As an addition, to respond to the original post, here is a valid C++
program with Chinese identifiers and comments:

-------
#include <iostream>
#include <iomanip>
#include <string>

using namespace std;

/* =D5=E2=B8=F6C + +=B3=CC=D0=F2=B4=F2=D3=A1=B1=CF=B4=EF=B8=E7=C0=AD=CB=B9=
=B1=ED=A1=A3*/

int main() {
  cout << " X |";
  for (int =BC=C6=CA=FD=C6=F7 = 1; =BC=C6=CA=FD=C6=F7 <= 10; ++=BC=C6=
=CA=FD=C6=F7) {
    cout << setw(4) << =BC=C6=CA=FD=C6=F7;
  }
  cout << endl << "-----+" << string(40, '-') << endl;
  for (int =CA=D7=CF=C8 = 1; =CA=D7=CF=C8 <= 10; ++=CA=D7=CF=C8) {
    cout << setw(4) << =CA=D7=CF=C8 << " |";
    for (int =B5=DA=B6=FE = 1; =B5=DA=B6=FE <= 10; ++=B5=DA=B6=FE) {
      cout << setw(4) << =CA=D7=CF=C8 * =B5=DA=B6=FE;
    }
    cout << endl;
  }
  return 0;
}
-------

Whether or not the above code will be displayed correctly on other
machines than mine depends on installed fonts and reader's active
encoding.

Whether a C++ compiler for the above encoding exists or not is
irrelevant for the purpose of this post ;-)

Cheers,
Francesco