Re: Detect XML document encodings with SAX
Sebastian wrote:
I discovered this post:
http://www.ibm.com/developerworks/library/x-tipsaxxni/
and implemented both approaches (SAX and Xerces XNI).
Unfortunately, for the attached XML file, both methods
Don't do attachments on Usenet.
output an encoding of UTF-8, while looking at the file
as they should. XML should be encoded in UTF-8 nearly always.
But SAX is a parser, so it doesn't output, it inputs. What are you telling us?
makes it clear that it is not UTF-8 encoded (all characters,
including the umlaut and the Euro-sign, take one byte, and the
declared encoding also is not UTF-8).
http://sscce.org/
Does anyone have an idea why that is so? And how I could
You used the default encoding in your Writer.
go about making some XML parser determine the correct encoding?
Your problem is writing the file, no? That has nothing to do with parsing.
If your problem is with reading the file, then the encoding in the XML declaration
should suffice to guide the parser. But then why do you talk about methods that
"output an encoding"?
However, according to
http://xmlwriter.net/xml_guide/xml_declaration.shtml#Encoding
supported encodings only include UTF-8, UTF-16, ISO-10646-UCS-2,
ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9, ISO-2022-JP, Shift_JIS, and EUC-JP,
as you would have learned had you researched your question.
So it looks like you must not accept XML documents with such a non-standard
encoding.
Show us the code, or at least an SSCCE of it.
--
Lew