Re: URL encoding
On 01/18/2011 08:06 PM, Roedy Green wrote:
On Tue, 18 Jan 2011 08:48:38 +0100, Luuk<Luuk@invalid.lan> wrote,
quoted or indirectly quoted someone who said :
The subject talks about 'URL encoding', not about how you can define
some text in a file, which is something completely different.
If there were a general purpose way of encoding forbidden characters
in a string, there would be nothing special about URLs. Every time the
problem is encountered, we create yet another variant solution.
With URLS, we have two.
Three, you forgot the part about needing to pigeonhole Unicode into a
US-ASCII-only protocol (i.e., DNS).
The problem is that special and safe characters depend on context.
Making `\' a universal escape character is all nice and well, but most
languages use `\' as an escape for string literals, so searching for an
unescaped `\' in the target would be the unholy combination `\\\\'.
Heaven forbid you should actually look for `\\' or deeper (things get
really fun when you have UTF-8 misinterpreted as ISO 8859-1, translated
into UTF-8 and then misinterpreted again as ISO 8859-1, so you get a
doubled version of the goop).
In short: don't try to use one identifier. Imagine trying to write out a
URL in an XML fragment from a program in a string literal:
String xml_frag = "<a href=\"http://asdf.com/omg\\\\25\"/>";
At a glance, which URL did I link to:
1. asdf.com/omg\\\\25
2. asdf.com/omg\\25
3. asdf.com/omg\25
4. asdf.com/omg25
Now, what does a literal `\' in the URI look like? Using % for the URI
is so much easier on my eyes and my brain.
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth