Re: Help with utf8
On Tue, 7 Apr 2009, Francois wrote:
I read a file encode as utf8, and it has accented characters displayed
as R??mi (in gvim).
I read and parse the file
File xmlFile is the file handler.
using:
InputStreamReader in = new InputStreamReader(new FileInputStream
(xmlFile), "UTF-8");
filter.parse(new InputSource(new BufferedReader(in)));
When the parsing is done, I output the file with
Writer out = new OutputStreamWriter(new FileOutputStream(outfile),
"UTF-8");
filter.setContentHandler(new XMLWriter(out));
During the parsing, I substitute the attributes content using a
HashMap wich is read from another file with
I don't understand what you mean by that. Substitute how?
FileInputStream r = new FileInputStream(d);
InputStreamReader is = new InputStreamReader(r);
System.out.println("Zmodif encoding " + is.getEncoding());
BufferedReader reader = new BufferedReader(is);
String line;
while ((line = reader.readLine())!= null){
byte[] conv = line.getBytes("ISO-8859-1");
String u8Line = new String(conv, "UTF8");
...
That looks like a really odd thing to do. What are you trying to achieve
by encoding a string as 8859-1 and then decoding it as UTF-8?
I put u8line in the HashMap and it to make the substitutions
}
My problem is that that output file has accented characters like this
Rémi instead of R??mi
I don't know where it comes from and how to change it ...
That's an XML numeric character escape. é means the unicode character
with code 233, which is a lowercase e with an acute accent. It's a
perfectly valid thing to find in an XML document; if the purpose of your
XML file is to be read by another program, it will be fine. If you want to
encode it as a normal character, you need to tell the XML encoder to do
that rather than use an escape; i don't know what this XMLWriter class
you're using is, but that's the object which is making that decision.
tom
--
You have now found yourself trapped in an incomprehensible maze.