Monday 20 January 2014

Expectations management: what does xmlSearch() actually search through?

G'day:
Here's an intriguing one that cropped up on the Railo Google Group the other day. Consider this code:

<!--- xmlSearchFirstExample.cfm --->
<cfxml variable="x">
<root>
    <parent>
        parent1
        <child>child1</child>
        <child>
            child2
            <grandchild>grandchild1</grandchild>
        </child>
        <child>
            child3
            <grandchild>grandchild2</grandchild>
        </child>
    </parent>
    <parent>
        parent2
        <child>
            child4
            <grandchild>grandchild3</grandchild>
        </child>
        <child>
            child5
        </child>
        <child>
            child6
            <grandchild>grandchild4</grandchild>
        </child>
    </parent>
</root>
</cfxml>
<cfscript>
    secondParent = xmlSearch(x, "//parent[2]")[1];
    writeDump(secondParent);

    childrenOfSecondParent = xmlSearch(secondParent, "//child");
    writeDump(childrenOfSecondParent);
</cfscript>

So we're grabbing the second parent node, and then from within that, getting its child nodes. What's wrong with it?


Well, here's the output:
xml element
XmlNameparent
XmlTextparent2
XmlChildren
xml element
XmlNamechild
XmlTextchild4
XmlChildren
xml element
XmlNamegrandchild
XmlTextgrandchild3
xml element
XmlNamechild
XmlTextchild5
xml element
XmlNamechild
XmlTextchild6
XmlChildren
xml element
XmlNamegrandchild
XmlTextgrandchild4
array
1
xml element
XmlNamechild
XmlTextchild1
2
xml element
XmlNamechild
XmlTextchild2
3
xml element
XmlNamechild
XmlTextchild3
4
xml element
XmlNamechild
XmlTextchild4
5
xml element
XmlNamechild
XmlTextchild5
6
xml element
XmlNamechild
XmlTextchild6

(I've deleted some of the bumpf ColdFusion outputs there, for the sake of brevity).

The first dump is as A-OK: just the second parent. But the second dump... lists all the children. Not just the children from within the XML object we asked xmlSearch() to search: secondParent.

The thread on the Railo Group pointed me at an "explanation" for this, which is in a Stack Overflow question / answer: "Running XPath on child node":

/foo will select based off of the root node, ignoring the context that you are evaluating the xpath against. foo (without the slash) is what you want; that selects based off of the current node.
Hmmm. Well that sounds dumb. But let's test:

childrenViaRelativePath = xmlSearch(secondParent, "child");
writeDump(var=childrenViaRelativePath);

Note how I've dispensed with the "//" in the xpath string here, and just going for the node names I want. This makes it a relative path, and the result is much better:

array
1
xml element
XmlNamechild
XmlTextchild4
2
xml element
XmlNamechild
XmlTextchild5
3
xml element
XmlNamechild
XmlTextchild6

So, OK: probably good to know that.

However I think this is a bit rubbish. Irrespective of how some Java class might effect this, if I pass an XML node into xmlSearch(), and xmlSearch() accepts it as an argument, then xmlSearch() shouldn't go "oh, you mean the whole XML doc this node is within" (because, no, I clearly didn't mean that); xmlSearch() should just do what it's been bloody told: search in that node!

What do you think? Can you think of a reason why it makes sense for it to behave the way it does? I ask because Micha from the Railo team has said that they only implemented it the same as above to maintain sideways compatibility with ColdFusion. Not because it specifically seemed the way to go with this one.

Update (18/2/2014):

For the sake of completion here... I reflected more on this, and did some more reading. And, ultimately, I now agree with the existing behaviour. Still: it'd be nice if it was documented properly.

Oh: one last thing. What if one did want to do a "contextless" xmlSearch() (ie: one starting with "//"... I dunno the "official" terminology for this) within that one node? Obviously "//child" didn't work. Well one can contextualise the contextless xpath thus:

grandChildrenUsingRelativeReference = xmlSearch(secondparent, ".//grandchild");
writeDump(var=grandChildrenUsingRelativeReference);

This starts from the current node ".", then applies the rest of the query to that. So only gets the grandchild nodes from with the current node (which is the second parent node, in this case). Output:

array
1
xml element
XmlNamegrandchild
XmlTextgrandchild3
2
xml element
XmlNamegrandchild
XmlTextgrandchild4

Cool.

Righto.

--
Adam