Re: How to slurp/get the content of a URI?
Arne Vajh?j wrote:
HttpURLConnection and its InputStream fetches bytes from the
server. No negotiations possible.
I think that's what I'm saying. Although I'm no longer sure that
HttpURLConnection doesn't fully support HTTP character sets. It might.
There are no default ISO-8859-1 in neither HTTP or Java. HTTP is
always explicit and Java default is system specific.
For a socket, yes, there is no default encoding. For HTTP, I think that
is not true. 8859-1 is the default if nothing is specified, and it is
legal to leave out the charset encoding -- in both the GET and the response.
I think, anyway. I could be all wrong about that.
Stefan has a valid question: If the content type isn't specified until
you read the header, and you don't know the content type, how do you
know what to open the stream as? The answer I think is that it's
defined to be 8859-1 by default.
Let me see if I can dig something up...
Content Negotiation for HTTP:
<http://en.wikipedia.org/wiki/Content_negotiation>
Some info on "Missing Charset" in the RFC:
<http://tools.ietf.org/html/rfc2616>
Search for 8859.
Back to Java: Also, URLConnection() looks like it will allow one to read
things like the content type and mime type before getting a Java
InputStream to the content:
URLConnection c = url.openConnection();
String mimeType = c.getContentType();
System.out.println( mimeType );
And similarly for getContentEncoding();
I gotta run. I hope I didn't booger things up too badly replying to
Stefan. Apologies if I did.