Re: #define and (brackets)

From:

"Igor Tandetnik" <itandetnik@mvps.org>

Newsgroups:

microsoft.public.vc.language

Date:

Sat, 29 Nov 2008 16:40:13 -0500

Message-ID:

<#tUITsmUJHA.4996@TK2MSFTNGP02.phx.gbl>

"Alan Carre" <alan@twilightgames.com> wrote in message
news:%23yHUBwjUJHA.6060@TK2MSFTNGP06.phx.gbl

"Alexander Grigoriev" <alegr@earthlink.net> wrote in message
news:uHhY05iUJHA.5244@TK2MSFTNGP04.phx.gbl...

"Alan Carre" <alan@twilightgames.com> wrote in message news:%>>>
"Igor Tandetnik" <itandetnik@mvps.org> wrote in message

news:%23$3ZwcXUJHA.6092@TK2MSFTNGP04.phx.gbl...

In what way do you believe they are treated differently?

NUM was already 2 tokens:

"-", "10"

Nah. Doubtful.

And yet it is the case.

It was ["-10"], a literal constant according to all experiments

Let's accept this, for the sake of argument. This makes your belief that
the compiler could go from ["-", "-10"] to ["--", "10"] even more
puzzling. It would require the compiler to break up an already
established token. Why do you think it is a reasonable, or desirable,
thing for a compiler to do?

If, as you claim, "--" (quoteless) is considered to be ONE token (as
with "-" (quoteless)), then how would the preprocessor interpret this
one:
EXPRESSION: "---" (quoteless)

As ["--", "-"], according to the so-called maximum munge rule:

2.4p3 If the input stream has been parsed into preprocessing tokens up
to a given character, the next preprocessing token is the longest
sequence of characters that could constitute a preprocessing token, even
if that would cause further lexical analysis to fail.

Is it the negation of the decrement? Or the decrement of the
negation, or the negation of the negation of the negation?

The tokenizer is not concerned with whether the sequence of tokens it
produces is a meaningful C++ construct (nor is the preprocessor).
However, I can give you an example where --- appears in a valid program:

int x = 0;
int y = x---1;

The last statement is equivalent to

int y = (x--) - 1;

I'll leave it as an exercise for the reader to figure out how the
following two programs work:

int main() {
  int x = 0;
  return ------x; // any even number of dashes.
}

struct S {
  int& operator-() {
    static int x = 0;
    return x;
  }
};

int main() {
  S s;
  return -----s; // any odd number of dashes.
}

Another exercise for the reader: construct a valid C++ program that
contains sequences &&&, &&&& and &&&&& (I believe a sequence of five
ampersands is the longest possible, but would love to learn otherwise).
Not inside a comment or a macro that's never used of course - that would
be cheating.

All of these are possible interpretations of "---" (again quoteless).

But only one is correct.

Certainly you aren't going to claim that "---" is yet another
specially-recognized token are you?

Certainly not.

So which is it?

["--", "-"]

No, what seems abundantly clear is that the preprocessor isn't doing
any math, it's simply replacing string "tokens" with their
corresponding definitions.

Quite.

Tokens are seperated by "delimitors" such
as SPACE (' ') and other accepted delimitors such as the minus sign
('-').

Minus sign is itself a token, not a delimiter between tokens.

Brackets, space, + - slash, comma, basically anything that's
not a "csym" seems to serve as a proper delimitor

Or rather, as a token:

2.12 Operators and punctuators
1 The lexical representation of C++ programs includes a number of
preprocessing tokens which are used in the syntax of the preprocessor or
are converted into tokens for operators and punctuators:
preprocessing-op-or-punc: one of
    { } [ ] # ## ( )
    <: :> <% %> %: %:%: ; : ...
    new delete ? :: . .*
    + * / % ^ & | ~
    ! = < > += = *= /= %=
    ^= &= |= << >> >>= <<= == !=
    <= >= && || ++ -- , ->* ->
    and and_eq bitand bitor compl not not_eq
    or or_eq xor xor_eq

Anyway, either the preprocessor knows C++ and does math or it's a
strtok(er)/srep(er).

It's mostly the latter, though the tokenizing process is more
complicated than what's achievable with strtok (and I'm not at all
familiar with srep).

And I seriously doubt it does any "token"
mathematics

I'm not sure what you mean by "token mathematics".

[though will do some basic algebra on numeric constants
such as 10/2 but that's about the extent of it from my experience].

No, the preprocessor won't. The compiler will. Consider:

#define X 10/2
#define STR1(x) #x
#define STR(x) STR1(x)

printf("%s", STR(X));

This prints "10/2", not "5". On the other hand, if you write

int main() { return X; }

the generated assembly will be equivalent to that generated for "return
5;". There will be no division instruction.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925