UTF-8 problems with windows

From:

Michael Jung <miju@golem.phantasia.org>

Newsgroups:

comp.lang.java.programmer

Date:

Mon, 10 Aug 2009 23:29:04 +0200

Message-ID:

<87skfzl5nj.fsf@golem.phantasia.org>

I have the following code fragment in a tiny webserver:

   ...
   os = sock.socket().getOutputStream();
   osr = new PrintWriter(new PrintStream(os, true, "UTF-8"));
   osr.println("HTTP/1.1 200 OK");
   osr.println("Content-Type: text/html; charset=utf-8");
   osr.println();
   osr.println(test());
   ...

   private String test() {
      String ret = null;
      try {
         StringBuffer tmpl = new StringBuffer
            ("<html><head></head><body>H\u00e2n</body></html>");
         ret = tmpl.toString();
     }
     catch (Exception e) {
         e.printStackTrace();
     }
     System.out.println(ret);
     return ret;
  }

With Linux, firefox and opera there is no problem and
the a with circumflex is printed nicely.

On Windows xp I get neither firefox nor IE to work correctly.

Firefox shows some FFFD square, but when I change from the (detected)
UTF-8 encoding to ISO-8859-1, it displays things correctly. But that
would be the rwong encoding!?

IE shows some empty rectangle in the main browser window, but when
looking at the page source, everything is shown correctly!?

I have seen the correct output, but don't remember how I got it; so
it's not missing glyphs.

This is probably not a Java question, as I suspect some windows magic
to happen here. Maybe it has something to do with the infamous BOM?
(I tried setting "file.encoding" to "UTF-8" for what it's worth. And
the cmd prompt from the out.println then o with circumflex, but that's
due to the windows legacy encoding, I think.)

Michael