SAX Parser for XML File with DTD

Q

How to add DTD in XML file to remove whitespace text content during the SAX parsing process?

✍: FYIcenter

A

Sometimes when you are using the SAX parser to process an XML file in a print pretty format, the characters() handler method will be called for those extra whitespace text contents.

For example, look at the following print pretty XML file, User.xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- Copyright (c) 2017 FYIcenter.com -->
<User>
    <ID>101</ID>
    <BirthDate>1970-01-01+00:01</BirthDate>
    <Name>Frank Y. Ivy</Name>
    <Sex>  Male</Sex>
</User>

If you run the example program, SaxXmlParser.java, presented in the previous tutorial with User.xml, you will see that characters() got called with those extra whitespaces:

>\fyicenter\jdk-1.8.0\bin\java SaxXmlParser User.xml

Parser class: com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl

.User(
    )
..ID(101)(
    )
..BirthDate(1970-01-01+00:01)(
    )
..Name(Frank Y. Ivy)(
    )
..Sex(  Male)(
)

One way to fix the problem is to add the DTD section in XML file to provide the XML file structure to help the SAX parser:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- Copyright (c) 2017 FYIcenter.com -->
<!DOCTYPE User [
   <!ELEMENT User (ID, BirthDate, Name, Sex)>
   <!ELEMENT ID (#PCDATA)>
   <!ELEMENT BirthDate (#PCDATA)>
   <!ELEMENT Name (#PCDATA)>
   <!ELEMENT Sex (#PCDATA)>
]>

<User>
    <ID>101</ID>
    <BirthDate>1970-01-01+00:01</BirthDate>
    <Name>Frank Y. Ivy</Name>
    <Sex>  Male</Sex>
</User>

If you run the example program, SaxXmlParser.java, presented in the previous tutorial with UserDTD.xml, you will not see any whitespaces text content:

>\fyicenter\jdk-1.8.0\bin\java SaxXmlParser UserDTD.xml

Parser class: com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl

.User
..ID(101)
..BirthDate(1970-01-01+00:01)
..Name(Frank Y. Ivy)
..Sex(  Male)

 

Using XML SAX API with Apache Xerces

⇒⇒FAQ for Apache Xerces XML Parser

2017-12-09, 365👍, 0💬