Re: Detect XML document encodings with SAX
On 11/21/2012 2:31 PM, Lew wrote:
Sebastian wrote:
I discovered this post:
http://www.ibm.com/developerworks/library/x-tipsaxxni/
and implemented both approaches (SAX and Xerces XNI).
Unfortunately, for the attached XML file, both methods
Don't do attachments on Usenet.
output an encoding of UTF-8, while looking at the file
as they should.
No.
If the XML prolog specifies another encoding than UTF-8,
then it should not return UTF-8.
XML should be encoded in UTF-8 nearly always.
XML allows for other encodings.
And Java XML parsers support it.
So it should always work.
But SAX is a parser, so it doesn't output, it inputs. What are you telling us?
Output usually mean System.out.println - that works fine with a parser.
If your problem is with reading the file, then the encoding in the XML declaration
should suffice to guide the parser. But then why do you talk about methods that
"output an encoding"?
Because he wants to know what it is.
> However, according to
> http://xmlwriter.net/xml_guide/xml_declaration.shtml#Encoding
> supported encodings only include UTF-8, UTF-16, ISO-10646-UCS-2,
> ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9, ISO-2022-JP, Shift_JIS,
and EUC-JP,
> as you would have learned had you researched your question.
>
> So it looks like you must not accept XML documents with such a
non-standard
> encoding.
Those that has researched would know that the XML spec do not
limit the encodings at all. The XML processor must support UTF-8
and UTF-16, but are free to support others.
Arne
Arne