Re: Ahhh.. URL wants to get encoded. Does Java wanna?
On Nov 6, 12:06 am, Roedy Green <see_webs...@mindprod.com.invalid>
wrote:
On Tue, 06 Nov 2007 01:24:35 -0500, Wayne <nos...@all4me.invalid>
wrote, quoted or indirectly quoted someone who said :
URI uri = new URI("http", "//www.example.com/you& I 10%? wierd & wierder", null);
System.out.println( uri.toURL() );
the way I read RFC 2396 is that reserved chars:
; / ? : @ & = + $ ,
are not supposed to be escaped. Perhaps Patricia could read the RFC
and tell us what it really means.
The character & is used in URLs and URIs to separate parts of the
query, in which case it should be present as an actual & character.
It can also occur inside query paramater names or values, in which
case it should be present in aencoded form, as the string %26. The
example URI Wayne gave uses (unintentionally) ampersands as query
separators, which is why the URI class isn't escaping them; if he
wants to use them as part of the path or part of query parameters or
values he'll have to encode them himself with
URLEncoder.encode(String, String) or similar.
Elsewhere within a URI, ampersands are not reserved and does not
require encoding, except in the scheme part, where they're simply
illegal.
I wish the people who write RFCs would provide examples to illustrate
the true meaning of the lawyerese.
The RFCs tend to be a codification of existing practice, rather than a
prescription. In the case of the URI RFC it's a little vaguer, since
URIs (that are not also URLs) aren't in terribly widespread use and
came about as an attempt to normalize URLs, so the RFC could be seen
as prescriptive rather than informative.
On the whole, this post-hoc RFC process works well: it gives the
people creating prototypes time and freedom to play with ideas and
discard the bad ones without prematurely codifying them in a
standard. It's not perfect, but then, what is?
-Owen