Wednesday 17 September 2014

PHP: references

G'day:
Yeah, it's been a busy day today. But I'm quite pleased with the PHP stuff I've learned. This is another quick one (OK, I'll stipulate now: a lot of these PHP ones will be short, so I'll stop saying that), and harkens back to something I talked about re CFML back in 2012: "Complex data-types in CF, and how they're not copied by reference".

PHP has a concrete and code-controllable notion of references; not just something that happens under the hood like in CFML.

I'll not go into what references are. You can read the PHP docs regarding references: "What References Are", or the language-neutral Wikipedia page: "Reference (computer science)".

CFML

Let's start with CFML though. In CFML we don't have any control over whether a piece of data is passed by value or passed by reference (I use that term incorrectly, but for expedience. Read the blog article I link to above for an explanation). Simple types and arrays are passed by value in CFML (Railo passes arrays by reference, that said), and other complex types are passed by reference.

Here's some example code which demonstrates CFML behaviour:

// references.cfm

original = "tahi,rua,toru";
copy = original;

original &= ",wha";
copy &= ",rima";

writeOutput("original: #original#<br>copy: #copy#<br>");
writeOutput("<hr>");

original = {"red"="whero", "orange"="karaka", "yellow"="kowhai"};
reference = original;

original["green"] = "kakariki";
original["blue"] = "kikorangi";

writeDump(original);
writeDump(reference);
writeOutput("<hr>");

reference = "and now for something completely different";
writeDump([original,reference]);
writeOutput("<hr>");

This outputs:

original: tahi,rua,toru,wha
reference: tahi,rua,toru,rima

struct
bluekikorangi
greenkakariki
orangekaraka
redwhero
yellowkowhai
struct
bluekikorangi
greenkakariki
orangekaraka
redwhero
yellowkowhai

array
1
struct
bluekikorangi
greenkakariki
orangekaraka
redwhero
yellowkowhai
2and now for something completely different


Here the assignment operator copies the value of the string original, and makes a duplicate of it as copy. They are two difference strings, so appending to one does not impact the other.

On the other hand when dealing with complex objects like a structure, the assignment operator creates a new variable which has a new reference pointing to the original. So it's not a "copy by reference" situation, but it's a similar "copy of reference" situation (I dunno the official term for this approach). The ramification is that adjustments made to the original are reflected in the reference to it, and vice versa. However if one gives the reference a new value completely, it does not change the original, because it overwrites that variable's reference with a new one to the new data.

If that's not clear, that other blog article I link to above goes through it thoroughly.

PHP

PHP has the ability to specify how assignments are made. One can do this:

$reference = &$original;

The ampersand indicates to make a new reference to the same data. The reference is the same, there's just a new variable pointing to that reference. In CFML's case there's a new variable, a new reference, just the new reference points to the same data as the original reference.

Here's some PHP code showing references in action. It's an approximation of the CFML code above.

// references.php

require "../../debug/dBug.php";

$original = "tahi,rua,toru";
$copy       = $original;
$reference = &$original;

$original .= ",wha";
$reference .= ",rima";
$copy .= ",ono";

echo "original: $original<br>reference: $reference<br>copy: $copy<br>";
echo "<hr>";

$original = ["red"=>"whero", "orange"=>"karaka", "yellow"=>"kowhai"];
$reference = &$original;
$copy = $original;

$original["green"] = "kakariki";
$original["blue"] = "kikorangi";
$copy["indigo"] = "poropango";

new dBug($original);
new dBug($reference);
new dBug($copy);
echo "<hr>";

$reference = "and now for something completely different";
echo "original: $original<br>reference: $reference<br>";
new dBug($copy);
echo "<hr>";

And the output:

original: tahi,rua,toru,wha,rima
reference: tahi,rua,toru,wha,rima

copy: tahi,rua,toru,ono

$original (array)
redwhero
orangekaraka
yellowkowhai
greenkakariki
bluekikorangi
$reference (array)
redwhero
orangekaraka
yellowkowhai
greenkakariki
bluekikorangi
$copy (array)
redwhero
orangekaraka
yellowkowhai
indigoporopango

original: and now for something completely different
reference: and now for something completely different

$copy (array)
redwhero
orangekaraka
yellowkowhai
indigoporopango


There's a few things to note:

  • the $reference string points to exactly the same string in memory as the $original one. So change one: it's reflected in the other.
  • however the $copy string is completely different:changes to it apply only to it, and not the $original.
  • PHP will copy an array (bear in mind PHP doesn't have structs, it's just got "associative arrays"... that's a story for another day though) by value if you don't tell it otherwise. as demonstrated by the $copy.
  • Making a reference to an array now seems to behave like CFML does.
  • However if one does not do it with a reference, it makes a new copy, so changes to the copy are specific to the copy.
  • Until one gives that reference an entirely new value... the new value goes into memory pointed to by both the $original and $reference references, so original takes that new value. This demonstrates the difference between proper "assign by reference" and that which CFML does.

So why would one want to do this? one does it mostly with function arguments, eg:

function increment(&$x)
{
     $x++;
}

$i = 0;
echo "\$i before: $i<br>";
increment($i);
echo "\$i after: $i<br>";

This outputs:
$i before: 0
$i after: 1

Here increment() takes a reference to $i as its $x argument value, so when $x is incremented... so is $i. This is a pretty dumb use case for using references though. They come into their own when dealing with large bits of data: we don't want to be copying huge objects / arrays / strings about the place when passing 'em to functions... usually just a reference is fine. This saves processing overhead and memory. One just needs to be mindful that when a function takes a reference, the value referenced could be changed by the function. So one needs to tread with caution.

I think this would be a good feature for CFML, and I have added an enhancement request which is being considered for uptake: 3638235. I had not raised it for Railo, but I have now: RAILO-3209.

--
Adam