Re: Are _T() and TEXT() macros equivalent?

From:

"David Ching" <dc@remove-this.dcsoft.com>

Newsgroups:

microsoft.public.vc.mfc

Date:

Mon, 16 Apr 2007 03:59:04 GMT

Message-ID:

<bwCUh.5774$5e2.1441@newssvr11.news.prodigy.net>

"Doug Harrison [MVP]" <dsh@mvps.org> wrote in message
news:edo423t729vfn8im94p09v7kljs04kfgbl@4ax.com...

On Sat, 14 Apr 2007 15:50:53 GMT, "David Ching"
<dc@remove-this.dcsoft.com>
wrote:

"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message
news:Xns991223EABEAMihaiN@207.46.248.16...

To say that it is FUNDAMENTAL to the
language that sizeof(char) == 1?

When something is part of the standard, is used (as an implicit
assumption)
by every single application developed in that language for the last 30
years,
and every single such application will break if this changes, then yes,
it is fundamental.

Err, no. HelloWorld.cpp does not make use of the fact that sizeof(char)
==
1. That's my point. Many modern apps are Unicode native and wouldn't
make
this assumption either.

Your argument would be better served by listing the sorts of things that
would break rather than holding "Hello, world" up as something that would
not break. I don't know about everyone, but I don't get a lot of mileage
out of "Hello, world".

The assertion was "every single application developed in that language for
the last 30 years". Hello World is a part of that. And you must have
missed my reply about my point being many programs don't store binary things
in char arrays and only manipulate arrays of characters as strings, so they
are not concerned if a char takes 1 byte or 100.

It also doesn't help to mock and deny facts, and I've previously stated
several facts in this thread.
It is a fact that the definition of the
language equates the terms "byte" and "char", defines sizeof(char) == 1,
and measures object size in terms of bytes (chars). It doesn't get any
more
"fundamental" than this, and like I said several messages ago, "char is
the
fundamental unit of addressing in C and C++."

Sorry if you thought I was mocking. If I came across that way, it's because
it sounds ludicrous to me that you promote sizeof(char) == 1 to be on the
same level that "all C++ implementations must recognize the keyword
'class'", which is what I would consider "fundamental". But I've found it
is a trait of C++ people to easily lose the forest from the trees.

I don't know if you've
actually stated it as such, but what you are proposing is to introduce a
new type "byte" to liberate "char" from its duties as the quantum of
object
size and representation.

I proposed that, yes. But what you don't get (even though I've said it
several times to several people) is that the "/unicode" switch can be turned
off or on at will. If it turns out a module makes liberal use of
sizeof(char) == 1, then by all means disable /unicode and don't change a
thing.

If you are at all serious about convincing people
who know the language, you should:

1. Go through the standard and list the sections that use char as byte.

2. Collect and present the ways programs make use of (1).

3. Describe the transformations that would be necessary to fix (2) after
changing (1).

4. Describe the difficulty and degree of automation possible to implement
(3).

5. Show why it's worth it to require updating 35+ years of C and 20+ years
of C++ code to conform to the new language.

6. Present your proposal in groups such as comp.std.c++, where you will
find many more language experts than you will here,

And because the compiler supports both /unicode and not, none of these
points are relevant.

And no, I am not at all serious about making it my career to change the
minds of those who hang out at comp.std.c++. I can see we have different
goals and values for our lives.

Myself, I'd prefer the language to have separate byte and char types, but
I
also know this would not be an easy thing to change. Here, I'll give a
brief example using my numbering system above:

2. Problem

char* strdup(const char* s)
{
  size_t len = strlen(s);
  char* res = (char*) malloc(len+1);
  return strcpy(res, s);
}

3. Fix

char* strdup(const char* s)
{
  size_t len = strlen(s);
  char* res = (char*) malloc((len+1)*sizeof(char));
  return strcpy(res, s);
}

4. To automate this, a program would have to be written that recognizes
that malloc is being used to allocate space for a char array. Moreover, it
would have to recognize this across function calls, translation units, and
even libraries. This program will not be written, so it will be up to
people to do this by hand.

This example is outdated. Here's how you would write strdup with modern C++
and not have to change a thing:

char* strdup(const char* s)
{
   size_t len = strlen(s);
   char* res = new char[len+1];
   return strcpy(res, s);
}

Another example concerns any function that takes a void* and writes data
as
bytes:

2. Problem

void write(const void* buf, size_t n)
{
  const unsigned char* p = (const unsigned char*) buf;
  const unsigned char* pEnd = p+n;
  while (p != pEnd)
     write(*p++);
}

3. Fix (partial)

void write(const byte* buf, size_t n)
{
  const byte* p = buf;
  const byte* pEnd = p+n;
  while (p != pEnd)
     write(*p++);
}

4. All calls must cast to byte* instead of relying on the standard
conversion to void*. This could be automated. However, if the buffer is a
char array and n is its length, that will have to fixed as in the previous
example, and that's not easy to automate. More generally, any call that
does not amount to write(&x, sizeof(x)) is problematic. Again, people
would
have to vet code line by line to make this change.

/unicode would not be compiled for this function, or the write() function
that takes a char parameter (not shown).

Note that part (2) of both examples I presented would compile OK under
your
hypothetical new language, and they would both lead to buffer overruns.

Not if /unicode were not specified (as it would not be by default).

No doubt, this is just scratching the surface. It's what I could think of
immediately without really trying. To do a reasonably thorough job of part
(2), you would need to pose the question to a great number of people
(easy)
and get them to think about it long and hard and answer you (not as easy).
You'd also need to survey millions of lines of code written in every area
people use the language. I haven't done that, of course, but I would have
to conclude, based on my knowledge of the language and experience using
it,
that you would be creating a brand new language. I say this because
existing code would not be portable to it, and no one would find it
worthwhile to update their code to use it, because using char as byte and
wchar_t or even TCHAR as "character" works well enough.

Well, let me sum it up once more, and then I am done with this whole topic,
because I have already spent too much time on it.

1. /unicode is purely optional to use, so it doesn't break any existing
code.
2. If it is the opinion that "wchar_t and TCHAR as character works well
enough" then ease of use simply isn't valued and there is nothing more to
say.
3. This attitude carries far beyond wchar_t and TCHAR, creating montrosities
like STL, Boost, misused pure virtual functions, and other things that
incite religious fervor. It's been enlightening to see how people
supporting these things actually think.

This whole thread started out as how the current C++ makes unoptimal life as
a C++ programmer on Windows, especially as compared with more modern
languages. Judging from the attitude shown here, this will likely continue,
and we as Windows programmers will plan accordingly.

-- David