Re: Detect XML document encodings with SAX
Am 21.11.2012 20:31, schrieb Lew:
Sebastian wrote:
I discovered this post:
http://www.ibm.com/developerworks/library/x-tipsaxxni/
and implemented both approaches (SAX and Xerces XNI).
[snip]
Your problem is writing the file, no? That has nothing to do with parsing.
No, it is with parsing the file. Parsing with the purpose of detecting
the encoding.
If your problem is with reading the file, then the encoding in the XML declaration
should suffice to guide the parser.
My question is exactly why in this case this does not suffice.
But then why do you talk about methods that
"output an encoding"?
I meant the System.out.println() statements in the code.
[snip]
Show us the code, or at least an SSCCE of it.
I was referring to the code in the IBM developerworks article that I
linked to. Perhaps I should simply have copied out that code into my
original post. So here goes:
import org.xml.sax.*;
import org.xml.sax.ext.*;
import org.xml.sax.helpers.*;
import java.io.IOException;
public class SAXEncodingDetector extends DefaultHandler {
/**
* print the encodings of all URLs given on the command line.
*/
public static void main(String[] args) throws SAXException,
IOException {
XMLReader parser = XMLReaderFactory.createXMLReader();
SAXEncodingDetector handler = new SAXEncodingDetector();
parser.setContentHandler(handler);
for (int i = 0; i < args.length; i++) {
try {
parser.parse(args[i]);
}
catch (SAXException ex) {
System.out.println(handler.encoding);
}
}
}
private String encoding;
private Locator2 locator;
@Override
public void setDocumentLocator(Locator locator) {
if (locator instanceof Locator2) {
this.locator = (Locator2) locator;
}
else {
this.encoding = "unknown";
}
}
@Override
public void startDocument() throws SAXException {
if (locator != null) {
this.encoding = locator.getEncoding();
}
throw new SAXException("Early termination");
}
}