Re: Loading a simple XHTML transitional document into a org.w3c.dom.Document
Ion Freeman wrote:
I'm just trying to do the simplest thing in the world. Where input
is a java.io.File that contains an transitional XHTML 1.0 file, I do
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance
();
dbf.setNamespaceAware(false);
db = dbf.newDocumentBuilder();
Document doc = db.parse(input);
Unfortunately, this tries to pull the DTD from the W3C, and they
didn't like that. So, they give me a 503 error. I tried the
EntityResolver from http://forums.sun.com/thread.jspa?threadID=5244492,
but that just gives me a MalformedURLException. Either way, my parse
fails.
I'm sure that at least tens of thousands of people have written code
to do this, but I can't find a (working) reference online. I think
most of my XML parsing happened when the W3C would just give the DTDs
out -- I understand that they found that unworkable, but I still need
to parse my document.
How should I be doing this?
Download the DTD and the 3 ENT files to your harddrive and tell
the parse to use those.
See code below.
Arne
=======================================================
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class XhtmlParse {
public static void main(String[] args) throws Exception{
String xml = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0
Transitional//EN\"
\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n<html>\r\n<head>\r\n<title>simple
document</title>\r\n</head>\r\n<body>\r\n<p>a simple
paragraph</p>\r\n</body>\r\n</html>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new DTDHandler());
Document doc = db.parse(new InputSource(new StringReader(xml)));
}
}
class DTDHandler implements EntityResolver {
@Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if(systemId.equals("http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"))
{
return new InputSource("C:\\xhtml1-transitional.dtd");
} else
if(systemId.equals("http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent")) {
return new InputSource("C:\\xhtml-lat1.ent");
} else
if(systemId.equals("http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent")) {
return new InputSource("C:\\xhtml-symbol.ent");
} else
if(systemId.equals("http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent")) {
return new InputSource("C:\\xhtml-special.ent");
} else {
return null;
}
}
}