Saturday 7 February 2015

PHP: objects as arrays, and can CFML benefit from the concept?

G'day:
We had a situation at work the other day in which we needed a collection class, I think it was Properties (that's in the real estate sense of "property", not the OO one: I work for an online travel company, and this was part of a search result). This is distinct from a single Property which represents a single hotel / hostel / whatever.

Using CFML we would have used a record set to represent this, and in PHP we'd generally represent this as an array (of individual Property objects). However the collection of Properties needed other behaviour as well: storing the minimum price per night, highest rating percentage etc. So we're definitely wanting an object of some description to keep the array of properties and other... erm... properties (in the OO sense now) defined together in the one place.

A traditional approach might be this (pseudo code):

class Properties {
    Property[] properties;
    Decimal lowestPrice;
    Decimal highestRating;

    // and some accessors etc
}

And that would have been fine. However I rather liked the notion of instead of having to pluck the properties array out of the object before doing anything with it... just to use the object itself as an array. C# has an idea of Indexers, and I was pretty sure PHP had at least an Iterable interface, and perhaps some other bits and pieces that might work as well.

Indeed it does, and I was able to come up with an object which one can treat like an array. Almost.


// Collection.php

class Collection implements Iterator, Countable, ArrayAccess
{
    protected $collection;
    protected $index;

    public function __construct($array){
        $this->collection = $array;
    }

    public function getCollection() {
        return $this->collection;
    }

    public function current() {
        return $this->collection[$this->index];
    }

    public function key() {
        return $this->index;
    }

    public function next() {
        ++$this->index;
    }

    public function rewind() {
        $this->index = 0;
    }

    public function valid() {
        return array_key_exists($this->index, $this->collection);
    }

    public function count() {
        return count($this->collection);
    }

    public function offsetExists($index) {
        return array_key_exists($index, $this->collection);
    }

    public function offsetGet($index) {
        return $this->collection[$index];
    }

    public function offsetSet($index, $value) {
        echo "offsetSet() called with [$index], [$value]<br>";
        if (is_null($index)) {
            $this->collection[] = $value;
        }else{
            $this->collection[$index] = $value;
        }
    }

    public function offsetUnset($index) {
        unset($this->collection[$index]);
    }
}

I've highlighted the method implementation that each of the Iterator, Countable and ArrayAccess interfaces require. These methods allow me to traverse through the object - as one would in a foreach() loop; "count" the elements in the object (eg: using count()); and set elements into the object's array using array notation.

Having implemented these interfaces, I no longer need to extract the array from the object to use, I can access it via the object itself.

Here I populate a test collection with Maori numbers 1-4:

$numbers = ['tahi', 'rua', 'toru', 'wha'];
$numbersCollection = new Collection($numbers);
printf('Collection after initialisation: %s<br>', json_encode($numbersCollection->getCollection()));
echo '<hr>';

Output:
Collection after initialisation: ["tahi","rua","toru","wha"]

Then loop over the object using a normal foreach() loop:

foreach ($numbersCollection as $number) {
   echo "$number ";
}
echo '<hr>';

Output:
tahi rua toru wha

I can also set elements in the array via either a direct index reference:

$numbersCollection[4] = 'rima';
printf('Collection after setting [4]: %s<br>', json_encode($numbersCollection->getCollection()));

Output:

Collection after setting [4]: ["tahi","rua","toru","wha","rima"]

and even via the shorthand append syntax:

$numbersCollection[] = 'ono';
printf('Collection after setting []: %s<br>', json_encode($numbersCollection->getCollection()));

Output:

Collection after setting []: ["tahi","rua","toru","wha","rima","ono"]

All that's reasonably good!

However from here, the behaviour leaves a bit to be desired. Firstly here's an example of where PHP's array implementation demonstrates itself as being a bit wanting. I tested removing an element:

unset($numbersCollection[2]);
printf('Collection after unset(): %s<br>', json_encode($numbersCollection->getCollection()));

The intent here is to remove the third element: toru. Here's what we get:

Collection after unset(): {"0":"tahi","1":"rua","3":"wha","4":"rima","5":"ono"}

OK, that ain't good. It's cleared the value and removed the element, but it's not reindexed the "array", so there's a gap left in the indexes. As a result the JSON encoder has gone "array? that's not an array, sunshine... erm, I better make it an object" (notice the curly braces, not the square ones, and the explicitly stated keys).

Initially I thought this was a glitch in my implementation, so I tested on a native array:

unset($numbers[2]);
printf('Array after unset(): %s<br>', json_encode($numbers));

Output:

Array after unset(): {"0":"tahi","1":"rua","3":"wha"}

That's pants. Amateurs. This is an artifact of PHP not understanding that there's a difference between an indexed array (indexed via sequential numbers) and an associative array (indexed on arbitrary key), which is a fairly fundamental implementation shortfall.

And it turns out PHP's implementation of the ArrayAccess interface is a bit hamstrung too. Well the interface implementation itself is OK, but then PHP's internal usage of the interface is... lacking. Here I use a native array function on my collection:

$uppercaseNumbers = array_map(function($number){
    return strtoupper($number);
}, $numbersCollection);
printf('Collection used in array_map(): %s<br>', json_encode($uppercaseNumbers));

And this yields:

Warning: array_map(): Argument #2 should be an array in Collection.php on line 101
Collection used in array_map(): null


Because... whilst PHP offers the interface, it doesn't itself actually expect the interface in its array functions:

array array_map ( callable $callback , array $array1 [, array $... ] )

Note that it expects an array for the second (and subsequent) arguments. Not an ArrayAccess. If PHP is going to offer an interface to enable objects to have array behaviour... it really ought to have gone the whole hog, and implemented an ArrayImplementation interface, and then expect one of those in all its array functions.

This is the story of PHP: they seem to have the ideas, but they don't seem to know how to do a thorough job of planning and implementing the ideas.



If it was implemented thoroughly... could this general idea be useful for CFML? I'm thinking of one unified interface that describes what one needs for native CFML array functionality to be able to operate on a collection object? What would be needed?

// Array interface
interface {

    // access
    any function get(required numeric id);
    void function set(required numeric id, required any value);
    void function remove(required numeric id);
    numeric function size();
    
    // traversal
    void function reset();
    void function next();
    any function current();

}

That should provide all the hooks native CFML would need to be able to use an object as an array (eg: using native array access syntax, functions, methods, and general language constructs such as for() loops).

I think this would be a pretty handy addition to CFML? I think CFML has always offered reasonable OO opportunities for CFML code, but it itself is pretty lacking in the OO department when it comes to interacting with that code. This is an example of this: its array functionality only works with native arrays. It should work with any object that has the behaviour of an array. Something CFML has in common with PHP then. Another thing.


One thing my colleague/mate Brian pointed out (this was in the context of the PHP code, not this CFML idea) is that if we're using a dynamic language, should we need to implement actual interfaces? There's a good case for duck typing here. Provided a class implements all the requisite methods to be able to be treated as an array, should the runtime simply not allow an object to be used in place of an array.

EG: for an object to be used in a foreach() loop, it needs to provide a next() and get() method (maybe a getKey() and getValue() method. Provided the object has those methods, it should be usable in that situation. Even more so in the CFML situation: the class doesn't need to have the methods specified... provided the method is there at runtime, that should be fine (for my PHP readers, in CFML one can inject new methods into an object at runtime). This is true language dynamicism at work. This is where CFML (and PHP) should be positioning themselves: get rid of the ceremony, and just crack on with the business of being a modern dynamic language.

For PHP, they need to get their act together and understand the difference between an indexed array and an associative array. However this is a fundamental language problem that is unlikely to be addressed. For CFML... there's an opportunity here to make the language a bit more cooler.

Thoughts?

--
Adam