Re: UDF-8 Reading for URL - not working
Amith wrote:
My problem is the UTF-8 string which i [sic] read from the URL is considered
as unicode.. i [sic] need it as UTF-8
UTF-8 *is* Unicode!
i [sic] want it to be printed as "?????????????????????????????????" and not as "\u0CA8\u0CAE\u0CCD
\u0CB8\u0CCD\u0C95\u0CB0\u0C97\u0CC1\u0CB0\u0CC1"
public class URLReader {
public static void main(String[] args) throws Exception {
URL url = new URL("http://www.google.com/transliterate/indic?
tlqt=1&langpair=en|kn&text=namskara%20guru&&tl_app=1");
BufferedReader in = new BufferedReader(
new InputStreamReader(
url.openStream(), "UTF8"));
String inputLine = "";
No need to initialize 'inputLine' to a value you are just going to throw away.
String fullString = "";
while ((inputLine = in.readLine()) != null)
fullString = fullString + new String(inputLine.getBytes(),"UTF-8");
This is silly. Just do what Lothar said and add the String to the String.
I'm also pretty sure this isn't correct anyway because the way you defined the
BufferedReader will have already converted the bytes from UTF-8 on the way in
to 'inputLine', so that the 'getBytes()' will create bytes representing UTF-16
encoding. Reconverting those bytes to String using UTF-8 seems like it would
not work. In any event, using straightforward String concatenation, or as
Lothar suggested, StringBuilder concatenation, should keep encoding issues out
of the way.
Strings in Java internally will always be UTF-16.
String string = fullString.substring(fullString.indexOf("[\"") + 2,
fullString.indexOf("\",]"));
System.out.println(string);
This will display the String using the platform's default encoding.
in.close();
This should be in a 'finally' block tightly associated with the input loop.
}
}
Do not use TAB characters for indentation of Usenet posts. Use spaces, up to
four per indent level. To get help you might want to keep the code readable.
--
Lew