Whitespace problems, xml-parsing
Hello, I have the following xml-file:
?xml version="1.0" encoding="UTF-8"?>
<staff xmlns="myns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="myns staff.xsd">
<employee hasQuit="false">
<id>4711</id>
<name>Linda</name>
<address>
<street>Some street 1337</street>
<city>Boston</city>
</address>
</employee>
<employee hasQuit="false">
<id>4712</id>
<name>Michael</name>
<address>
<street>Another street 122</street>
<city>Stockholm</city>
</address>
</employee>
</staff>
which is valid according to its schema.
I'm very rusty at java and this is the first time I've been working
with xml in any programming language and my problem is that when I
parse it I get a lot of text nodes containing just whitespace even
though I thought I set it to ignore such whitespace. The output is:
Are we ignoring element content whitespace? true
text data start
text data end
text data start
text data end
text data start
4711
text data end
text data start
text data end
text data start
Linda
text data end
text data start
text data end
text data start
text data end
text data start
Some street 1337
text data end
text data start
text data end
text data start
Boston
text data end
text data start
text data end
text data start
text data end
text data start
text data end
text data start
text data end
text data start
4712
text data end
text data start
text data end
text data start
Michael
text data end
text data start
text data end
text data start
text data end
text data start
Another street 122
text data end
text data start
text data end
text data start
Stockholm
text data end
text data start
text data end
text data start
text data end
text data start
text data end
And my code:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.XMLConstants;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
public class DOM_Demo {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setIgnoringElementContentWhitespace(true);
factory.setNamespaceAware(true);
factory.setSchema(loadSchema("staff.xsd"));
System.out.println("Are we ignoring element content
whitespace? " + factory.isIgnoringElementContentWhitespace());
DocumentBuilder document_builder =
factory.newDocumentBuilder();
Document doc1 = document_builder.parse("staff.xml");
traverse(doc1.getFirstChild());
}
catch (Throwable t) {
System.out.println("Exception caught: " +
t.getLocalizedMessage());
}
}
private static Schema loadSchema(String schemaFile) throws
Throwable {
SchemaFactory schema_factory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
return schema_factory.newSchema(new java.io.File(schemaFile));
}
private static void traverse(Node current_node) {
if (current_node.getNodeType() == Node.TEXT_NODE) {
Text text = (Text)current_node;
if (!text.isElementContentWhitespace()) {
System.out.println("text data start");
System.out.println(text.getData());
System.out.println("text data end");
}
else {
System.out.println("element content whitespace");
}
}
else if (current_node.getNodeType() == Node.ELEMENT_NODE) {
NodeList children = current_node.getChildNodes();
for (int i = 0; i < children.getLength(); ++i) {
traverse(children.item(i));
}
}
}
}
Sorry for the long post but I wanted to include all details. I want to
get rid of all element data that doesn't reside in elements that are
supposed to have it (id, name, street, city). Hope you understand what
I mean.
Thanks for reading and thanks for any replies!
- WP