Parsing XML with SAX and DOM chunking (part 2)

In part 1 we introduced a way of chunking Xml. In a recent project I was tasked with creating a large Xml file and I used the chunking facility described to process a particular element.

The following JUnit test case illustrates how to use the SAXObjectToObject class. This test case has been stripped back to using an XMLReader to read a sample PAIN.008 file available to download from the iso20022 site. We have highlighted the important lines and will offer a brief description after the code sample.

The test case is dependant on a class package created using the JAXB tooling. The schema to run through the JAXB tooling is available here. A discussion on JAXB is beyond the scope of this post.

The JAXB reference implementation that ships with JRE 7 is 2.2.4, in our testing we found an issue with this implementation and the marshalling process. This problem was resolved by switching to 2.2.6. The root cause of the issue was around the elementFormDefault=”qualified” aspect of the pain.008.001.03 schema.

  • 24
    Creates the instance of the SaxToObjectElement with a parameterised type of iso20022.GroupHeader55. The name of the element we are looking for is “GrpHdr”
  • 26
    Creates an anonymous handler again with a parameterised type of iso20022.GroupHeader55. In this implementation we simply hold on to a reference of the GroupHeader55 instance in a member variable
  • 38
    This is how the XMLFilter implementation i.e. SaxElementToObject is associated with an XMLReader object.

The above example has been purposely kept simple for demonstration purposes. In our project we used this component as part of a pipeline for converting a Standard18 file to a PAIN.008 file. The SaxElementToObject was used to extract the iso20022.GroupHeader55 object. We used this object to reconcile with the UTL1 record on the Standard18 file. This conversion pipeline will be covered in a future post.

Tagged with: , , , , ,
Posted in Java, Sax, Xml

Leave a Reply

Your email address will not be published.