Re: multi-line Strings
On 12/10/2012 4:52 PM, Arne Vajh?j wrote:
On 12/10/2012 4:22 PM, Eric Sosman wrote:
On 12/10/2012 3:08 PM, Arne Vajh?j wrote:
[...]
PS: And for those that do not know C#, then C# has "" strings
with \ as escape like Java, but also has @"" string where
\ is not an escape and where line change are allowed.
As one of "those," and curious: Can a @"" string have an
embedded " character?
Yes.
An " inside @"" is encoded as "".
Aha! Another FORTRAN legacy! As of FORTRAN IV you could
write 'I''M HERE' instead of 8HI'M HERE, which most people
considered a great advance -- in the late 1960's.
My point, of course, is that there's still an escape mechanism
at work. It's a different mechanism, yes, but it still has the
What You See Ain't What You Get problem this thread has been
complaining about. And here's a funny thing about inventing an
escape mechanism: Even if the special character sequences were
surpassingly uninteresting and spectacularly rare before being
adopted as escapes, their very adoption makes them suddenly
interesting and much more common. You'll find yourself wanting
to write a regex that looks for "" inside a @"..." string, and
you'll get something like
@"@""([^""]*""""")*[^""]*"""
.... leaving you pretty much where you started, just with a new
suit of clothes on the Emperor. Also, we still need to produce
"\u0281 is the IPA voiced uvular fricative"
.... on input systems that cannot generate the IPA voiced uvular
fricative all by themselves.
Source has syntax -- at this level we usually speak of "lexing,"
but a lexer is really just a parser optimized to recognize a simple
syntax. A big job of the lexer is to distinguish metacharacters
from payload characters, and if every character could potentially
appear as payload there has to be some kind of convention to
discriminate the different usages. Those conventions mean that
WYSAWYG will inevitably occur, to a greater extent or a lesser.
It's unfortunate that both Java and regex use \ so heavily,
because it leads to a lot of escaping-of-escapes and harms
readability. But why should it be a given that Java's literals
should be different to avoid conflict with regex syntax? Why
not change the regex syntax instead, and use, say, ~ for the
role now taken by \? It might improve regexes to the point
where they're merely unreadable, instead of intolerable. ;-)
--
Eric Sosman
esosman@comcast-dot-net.invalid