Wednesday, 9 April 2014

Ye Olde XMLSearche Bugge

G'day:
Henry mentioned this last night:

That sounded like something I could get my teeth into, so had a look at it this morning.



The gist of it is that xmlSearch() is "unreliable" when it's called simultaneously more than once on the same XML doc; for example if the doc is in the application scope, or it's being hit with <cfthread> or something.

There's a bunch of stuff written about it, but I could not find a complete, stand-alone repro case for it anywhere. So I've written one. It was easy enough.

Firstly I build some XML to work with:

<cfparam name="URL.seed" type="integer" default="0">
<cfset id = 0>
<cfoutput>
<cfxml variable="x">
    <aaa id="#++id#">
    <cfloop index="i" from="1" to="#URL.seed#">
        <bbb id="#++id#">
            <cfloop index="i" from="1" to="#URL.seed\2#">
                <ccc id="#++id#">
                    <cfloop index="i" from="1" to="#URL.seed\4#">
                        <ddd id="#++id#" />
                    </cfloop>
                </ccc>
            </cfloop>
        </bbb>
    </cfloop>
    </aaa>
</cfxml>
</cfoutput>
<cffile action="write" file="#expandPath('./text.xml')#" output="#x#">

I run this with a seed param of 20, which gives me 1000-odd nodes.

Next I have some code that searches for a specific ID in the XML, and call that from within some threaded code:

param name="URL.threads" type="integer" default=1;

request.x = xmlParse(expandPath('./test.xml'));
nodes = xmlSearch(request.x, "//ddd");
maxId = nodes[arrayLen(nodes)].xmlAttributes["id"];

request.results = [];
threads = [];
for (i=1; i <=URL.threads; i++){
    threadName = "t#i#";
    arrayAppend(threads, threadName);
    thread name="#threadName#" action="run" max="#maxId#" {
        for (i=1; i <=attributes.max; i++){
            try {
                xpath = '//*[@id="#i#"]';
                result = xmlSearch(request.x, xpath);
                arrayAppend(request.results, {
                    thread    =thread.name,
                    index    = i,
                    resultID= result[1].xmlAttributes["id"]
                });
            }
            catch (any e){
                writeLog(file="xmlSearchBug", text="#thread.name# #i# #xpath# #e.type# #e.message# #e.detail#");
                abort;
            }
        }
    }
}
thread action="join" name="#arrayToList(threads)#";
writeOutput("Processing complete<br>");
writeOutput("Validating&hellip;<br>");

if (arrayLen(request.results) != URL.threads * maxId){
    writeOutput("Processing did not complete<br>");
}


for (result in request.results){
    if (result.index != result.resultId){
        writeDump(result);
        break;
    }
}
writeOutput("Validation complete<br>");

This code basically just pounds the XML looking for each ID, and "logs" where it found it. This runs A-OK if I run it with just a single thread. But if I run it using multiple threads (and I mean even "2") it quickly craps out with:

"Information","cfthread-1","04/09/14","13:43:22",,"T1 1 //*[@id=""1""] Expression Unable to process the result of the XMLSearch for ''. ColdFusion is unable to process the result of the XPath search. You may have an undefined variable in the xpath expression."
"Information","cfthread-0","04/09/14","13:43:22",,"T2 2 //*[@id=""2""] Expression Unable to process the result of the XMLSearch for ''. ColdFusion is unable to process the result of the XPath search. You may have an undefined variable in the xpath expression."

Note that CF is trying to xmlSearch() for "", whereas I had just told it to search for '//*[@id="1"]'.

In the application.log file we have this:

"Error","cfthread-1","04/09/14","13:43:22",,"T1: null"
"Error","cfthread-0","04/09/14","13:43:22",,"T2: null"

So... yeah, there's something amiss there. There's no such problem on Railo, FWIW.

Henry raised a bug for this: 3739102, but it seems to have vanished. If/when it comes back, I'll add this repro case.

Anyway, that's all I have to say on that. Time to east some lunch.

--
Adam