Re: Help with XML processing using DOM
Jeff Higgins wrote:
Lew wrote:
Jeff Higgins wrote:
<a>
<b/>
</a> Element node <a> contains Text node and Element node <b>
I'm not as familiar with DOM as SAX, but isn't the whitespace ignorable?
Well, good question. One I haven't considered.
According to the DocumentBuilderFactory Javadoc for the method
setIgnoringElementContentWhitespace(boolean)
|quote|
Specifies that the parsers created by this factory must eliminate
whitespace in element content (sometimes known loosely as 'ignorable
whitespace') when parsing XML documents (see XML Rec 2.10). Note that
only whitespace which is directly contained within element content
that has an element only content model (see XML Rec 3.2.1) will be
eliminated. Due to reliance on the content model this setting requires
the parser to be in validating mode.
By default the value of this is set to false.
|unquote|
So, it looks like yes if I've specified an \element only content model \
in my dtd or schema for the particular Element in question.
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class TestDomWhitespace
{
public static void main(String argv[])
{
Document document = null;
final String instance =
"<?xml version='1.0' standalone='yes'?>" + "\n" +
"<!DOCTYPE a [" + "\n" +
"<!ELEMENT a (b , e)>" + "\n" +
"<!ELEMENT b (c , d*)>" + "\n" +
"<!ELEMENT c (#PCDATA)>" + "\n" +
"<!ELEMENT d (#PCDATA)>" + "\n" +
"<!ELEMENT e ANY>]>" + "\n" +
"<a>" + "\n" +
" <b>" + "\n" +
" <c>foo</c>" + "\n" +
" <d>foo</d>" + "\n" +
" <d>foo</d>" + "\n" +
" </b>" + "\n" +
" <e></e>" + "\n" +
"</a>" + "\n";
System.out.println(instance);
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setValidating(true);
// Set following method true produces abe
// Set following method false produces a#textb
factory.setIgnoringElementContentWhitespace(false);
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
document = builder.parse(new InputSource(
new StringReader(instance)));
}
catch (ParserConfigurationException e)
{
e.printStackTrace();
}
catch (SAXException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
Node domNode = document.getDocumentElement();
System.out.print(domNode.getNodeName());
domNode = domNode.getFirstChild();
System.out.print(domNode.getNodeName());
domNode = domNode.getNextSibling();
System.out.print(domNode.getNodeName());
}
}