Sunday, 17 March 2024

CFML: solving a CFML problem with Java. Kinda.

G'day:

The other day on the CMFL Slack channel, Nick Petrie asked:

Anyone know of a native or plugin-based solution to pretty-formatting XML for display on the page? I'd like to output the XML nested this:
I'll use a simplified example> we wanna render this:
<aaa><bbb ccc="ddd"><eee/></bbb></aaa>
Like this:
<aaa>
    <bbb ccc="ddd">
        <eee/>
    </bbb>
</aaa

There's no native CFML way of doing this, but I figured "this is a solved problem: there'll be an easy way to do it in Java". I googled java prettier xml, and the first match was on the Baeldung website (Pretty-Print XML in Java) which is a site I trust to have good answers. But I checked that and a few others, and they all seem to be solving the problem the same way. So I decided to run with that.

The task at hand is to convert this to CFML (still using the Java libs to do the work, I mean, just runnable in CFML):

public static String prettyPrintByTransformer(String xmlString, int indent, boolean ignoreDeclaration) {

    try {
        InputSource src = new InputSource(new StringReader(xmlString));
        Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src);

        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, ignoreDeclaration ? "yes" : "no");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        Writer out = new StringWriter();
        transformer.transform(new DOMSource(document), new StreamResult(out));
        return out.toString();
    } catch (Exception e) {
        throw new RuntimeException("Error occurs when pretty-printing xml:\n" + xmlString, e);
    }
}

To be very clear, this is not my code, it's from Pretty-Print XML in Java.

And also to be clear: I'm not gonna be doing anything unique or difficult or insightful or anything like that in this article. All I'm gonna do is to show how easy it is to convert native Java code to native CFML code, and this is a topical use case. It's gonna be a function with some object creation and some variable assigments. I'm gonna go line-by-line through that function above, and CFMLerise it. I just made that word up.

I'm gonna be using trycf.com to write and run this code, and the aim is to have a version that runs in both CF and Lucee.

First, let's run with the same function signature:

unformattedXml = '<aaa><bbb ccc="ddd"><eee/></bbb></aaa>' 

public string function prettyPrintByTransformer(required string xmlString, numeric indent=4, boolean ignoreDeclaration=true) {

}

prettyPrintByTransformer(unformattedXml)

In each step I'll be using that same unformattedXml input value, and just running the function to ensure it's got no compilation errors and each new statement I add "works" (in that it doesn't have runtime errors). The function won't do anything useful until it's done.

In this first step note that I've given the latter two parameters sensible defaults. This is a change from the Java version.

First things first: I don't like how they've put that largely pointless try/catch in the Java code. To me that sort of error-handling should be in the calling code if needed, not embedded in the implementation. If the implementation errors-out: let it. The actual exception will be more useful than swallowing the real exception and throwing a contrived one.

I'll include each statement I'm converting as a comment above the CFML version. This is only so I can draw your focus to it. I'd never include comments of this nature in my actual code.

public string function prettyPrintByTransformer(required string xmlString, numeric indent=4, boolean ignoreDeclaration=true) {
    // InputSource src = new InputSource(new StringReader(xmlString));
    var xmlAsStringReader = createObject("java", "java.io.StringReader").init(xmlString) // new StringReader(xmlString)
    var src = createObject("java", "org.xml.sax.InputSource").init(xmlAsStringReader)
}

I could have done this without the intermediary variable, but CFML is quite cumbersome making Java object proxies, so I think the code is clearer spread over two statements.

    // Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src);
    var document = createObject("java", "javax.xml.parsers.DocumentBuilderFactory").newInstance().newDocumentBuilder().parse(src)
    // TransformerFactory transformerFactory = TransformerFactory.newInstance();
    var transformerFactory = createObject("java", "javax.xml.transform.TransformerFactory").newInstance()

Now: this statement did not work for me:

transformerFactory.setAttribute("indent-number", indent);

The CFML equivalent is exactly the same as it happens, but I was just getting "Not supported: indent-number" (CF) or "TransformerFactory does not recognise attribute 'indent-number'." (Lucee). I presume it's a library version difference, but I was fairly limited in my troubleshooting as I was running this on trycf.com. Although I did verify I was getting the same thing on my local CF2023 container too. I googled a bit and found a work around, but I'll come back to this a bit further down.

    // Transformer transformer = transformerFactory.newTransformer();
    var transformer = transformerFactory.newTransformer()
    /*
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, ignoreDeclaration ? "yes" : "no");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    */
    var outputKeys = createObject("java", "javax.xml.transform.OutputKeys")
    transformer.setOutputProperty(outputKeys.ENCODING, "UTF-8")
    transformer.setOutputProperty(outputKeys.OMIT_XML_DECLARATION, ignoreDeclaration ? "yes" : "no")
    transformer.setOutputProperty(outputKeys.INDENT, "yes")
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", indent)

Once again, I need an intermediary variable here for the outputKeys class proxy. Just to save repetition in this case.

Also note the last line there is extra: it's the solution for the bit that wasn't working earlier. Easy.

    // Writer out = new StringWriter();
    var out = createObject("java", "java.io.StringWriter").init()
    // transformer.transform(new DOMSource(document), new StreamResult(out));
    var documentAsDomSource = createObject("java", "javax.xml.transform.dom.DOMSource").init(document)
    var outAsStreamResult = createObject("java", "javax.xml.transform.stream.StreamResult").init(out)
    transformer.transform(documentAsDomSource, outAsStreamResult)

More intermediary variables here.

    // return out.toString();
    return out.toString()

And that's it. I've not got much commentary in all that lot, because it's all so straight forward [shrug].

The end result in one piece is thus:

public string function prettyPrintByTransformer(required string xmlString, numeric indent=4, boolean ignoreDeclaration=true) {
    var xmlAsStringReader = createObject("java", "java.io.StringReader").init(xmlString)
    var src = createObject("java", "org.xml.sax.InputSource").init(xmlAsStringReader)
    
    var document = createObject("java", "javax.xml.parsers.DocumentBuilderFactory").newInstance().newDocumentBuilder().parse(src)
    
    var transformerFactory = createObject("java", "javax.xml.transform.TransformerFactory").newInstance()
    var transformer = transformerFactory.newTransformer()
    
    var outputKeys = createObject("java", "javax.xml.transform.OutputKeys")
    transformer.setOutputProperty(outputKeys.ENCODING, "UTF-8")
    transformer.setOutputProperty(outputKeys.OMIT_XML_DECLARATION, ignoreDeclaration ? "yes" : "no");
    transformer.setOutputProperty(outputKeys.INDENT, "yes")
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", indent)
    
    var out = createObject("java", "java.io.StringWriter").init()
    
    var documentAsDomSource = createObject("java", "javax.xml.transform.dom.DOMSource").init(document)
    var outAsStreamResult = createObject("java", "javax.xml.transform.stream.StreamResult").init(out)
    transformer.transform(documentAsDomSource, outAsStreamResult)

    return out.toString()
}

And a coupla test runs:

formattedXml = prettyPrintByTransformer(unformattedXml)
writeOutput("<pre>#encodeForHtml(formattedXml)#</pre>")
<aaa>
    <bbb ccc="ddd">
        <eee/>
    </bbb>
</aaa>
formattedXml = prettyPrintByTransformer(unformattedXml, 8, false)
writeOutput("<pre>#encodeForHtml(formattedXml)#</pre>")
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aaa>
        <bbb ccc="ddd">
                <eee/>
        </bbb>
</aaa>

The output above is from Lucee. On CF we get this:

<?xml version="1.0" encoding="UTF-8"?><aaa>
        <bbb ccc="ddd">
                <eee/>
        </bbb>
</aaa>

Note this is nothing to do with the CFML code, it'll be some Java library variation on the Lucee and CF set-ups on trycf.com.

Right, so all this lot shows is that there's no reason to shy away from implementing a CFML version of some Java code you might find that solves a problem. So in turn, if you have some "algorithmic" issue that you wanna solve in CFML, don't shy away from googling for Java solutions, and checking how easy/hard it is to convert.

A runnable version of this code is @ trycf.com.

Righto.

--
Adam