Tuesday 12 January 2016

PHP & CFML: xpath with empty name spaces

G'day:
My mate who sits next to me at work, Amar, was trying to extract some info from an XML document, and we stumbled over the xpath syntax when there was a namespace defined, but no prefix was given. I'm completely unused to using xpath in PHP (I've had to query something once, I think), but had done a fair bit back in my CFML days.

Here's the XML in question (well: it's not the same XML, but it's equivalent):

<Response xmlns="http://example.com/ns/">
   <user>
      <dateOfBirth>1947-01-08</dateOfBirth>
      <firstName>Ziggy</firstName>
      <lastName>Stardust</lastName>
      <gender>?</gender>
   </user>
</Response>

See we've got a namespace declaration but no bloody prefix defined. Grumble.

On CF9 namespaces could be kinda ignored: just not specifying the namespace at all:

raw  = '<Response xmlns="http://example.com/ns/">
   <user>
      <dateOfBirth>1947-01-08</dateOfBirth>
      <email>sailor@example.com</email>
      <firstName>Ziggy</firstName>
      <lastName>Stardust</lastName>
      <gender>?</gender>
   </user>
</Response>';
xml = xmlParse(raw);

usingEmptyNamespace = xmlSearch(xml, "/:Response/:user/:lastName");
writeDump(usingEmptyNamespace);


Via cflive.net this yields:


Cool. However at some point - it might have been CF10, but I don't know - taking this approach stopped working because ColdFusion changed its XML parsing engine and apparently empty namespaces like that aren't legal.

The solution I had discovered (via googling and Stack Overflow) was to use the local-name() xpath function:

usingLocalName = xml.search("/*[local-name()='Response']/*[local-name()='user']/*[local-name()='firstName']");
writeDump(usingLocalName);

And this yields (I'm using ColdFusion 2016's CLI now), hence the format change:

>cf xpath.cfm
array

1) [xml element]
        XmlName:        firstName
        XmlNsPrefix:
        XmlNsURI:       http://example.com/ns/
        XmlText:        Ziggy
        XmlComment:
        XmlAttributes:  [struct]
        XmlChildren:

>

Now I switch to PHP, and have to make this lot work. Firstly the empty path version simply didn't work:

<?php

$raw = '<Response xmlns="http://example.com/ns/">
   <user>
      <dateOfBirth>1947-01-08</dateOfBirth>
      <firstName>Ziggy</firstName>
      <lastName>Stardust</lastName>
      <gender>?</gender>
   </user>
</Response>
';

$xml = new SimpleXMLElement($raw);

$usingEmptyNamespace = $xml->xpath("/:Response/:user/:gender");
var_dump($usingEmptyNamespace);


>php xpath.php
PHP Warning:  SimpleXMLElement::xpath(): Invalid expression in xpath.php on line 15

Warning: SimpleXMLElement::xpath(): Invalid expression in xpath.php on line 15
bool(false)

>

Using the local-name() approach worked fine in PHP:

$usingLocalName = $xml->xpath("/*[local-name()='Response']/*[local-name()='user']/*[local-name()='firstName']");
var_dump($usingLocalName);



>php xpath.php
array(1) {
  [0]=>
  object(SimpleXMLElement)#2 (1) {
    [0]=>
    string(5) "Ziggy"
  }
}

>

However I was certain PHP would do things better than that, and doing some reading, I see they have a way of resolving the lack of defined namespace, using the registerXPathNamespace() method:

$xml->registerXPathNamespace('db', 'http://example.com/ns/');
$usingRegisteredNamespace = $xml->xpath("/db:Response/db:user/db:lastName");
var_dump($usingRegisteredNamespace);

This allows me to specify a prefix to make the xpath string legit. Nice one!

>php xpath.php
array(1) {
  [0]=>
  object(SimpleXMLElement)#2 (1) {
    [0]=>
    string(8) "Stardust"
  }
}

>

One last thing I tried on a whim in CFML which worked was that it seems one can specify a wildcard namespace:

usingWildcardNamespace = xml.search("/*:Response/*:user/*:firstName");
writeDump(usingWildcardNamespace);


>cf xpath.cfm
array

1) [xml element]
        XmlName:        firstName
        XmlNsPrefix:
        XmlNsURI:       http://example.com/ns/
        XmlText:        Ziggy
        XmlComment:
        XmlAttributes:  [struct]
        XmlChildren:

>

This did not work on PHP. I dunno enough about XML engines and parsing and searching to make a comment on the whys and wherefores of what one's expectations ought to be when it comes to this sort of stuff, but it's good to know about registerXPathNamespace(), and also good to know about the wildcard stuff with CFML.

Not very incisive or groundbreaking stuff, but it's just what I had to work out today.

Righto.

--
Adam