Sunday, 28 September 2014

PHP: looking at some interesting "unexpected" behaviour with references

G'day:
A week or so ago I was fascinated by this Stack Overflow question: "How to changing value in $array2 without referring $array1?". It offers this code:

// baseline.php

$array1 = array(1,20);
$x = &$array1[1];
$array2 = $array1;
$array2[1] = 22;
print_r($array1[1]); // Output is 22

And this result:

22

In PHP, as the = operator makes a copy of the variable being assigned, so one (and when I say "one", I mean "the person asking the question, and myself as well") might be surprised to see the answer is "22", rather expecting it to be "20". Surely $array1[1] is discrete from $array2[1], and $array1[1] should not be impacted by the change to $array2[1]. $x is not a macguffin in this: remove that line, and things behave "as expected". How is making $x - a reference to $array1[1] - somehow intertwined with $array2??


That initial question was followed-up by another one: "Assign by reference bug", and that has a good answer which explains theoretically what's going on. The key part is this:

This is explained over at the PHP manual (even if you have to spend more time than you should have to in order to find it), specifically over at http://php.net/manual/en/language.types.array.php#104064

The "shared" data stays shared, with the initial assignment just acting as an alias. It's not until you start manipulating the arrays with independent operations like ...[] = ... that the intepreter starts to treat them as divergent lists, and even then the shared data stays shared so you can have two arrays with a shared first n elements but divergent subsequent data.
And the relevant extract from the docs says:

please note that when arrays are copied, the "reference status" of their members is preserved (http://www.php.net/manual/en/language.references.whatdo.php).
And on that page:

In other words, the reference behavior of arrays is defined in an element-by-element basis; the reference behavior of individual elements is dissociated from the reference status of the array container.
OK, cool, I believe it. However I wanted to see it in action.

My first hurdle here was getting a lucid/helpful answer to "PHP determine if a variable is a reference". Most answers I spotted initially were "no, can't be done. Why even would you want to?" (but not in a "let's see if there's another approach" sense,  but in a "I'm getting defensive about PHP" sense).

However that's not strictly true, as it turns out. I switched my googling to "PHP reference count", and the first link got me pointed in the direction of  XDebug, which enables one to inspect reference counts and stuff like that.

Here's a reworked version of the code above, with some debug:

// baselineWithDebug.php

echo "<hr><h3><code>numbers</code> created</h3>";
$numbers = array("tahi", "rua", "toru");
xdebug_debug_zval('numbers');


echo "<hr><h3><code>refToSecondElement</code> created</h3>";
$refToSecondElement = &$numbers[1];
xdebug_debug_zval('numbers');
xdebug_debug_zval('refToSecondElement');


echo "<hr><h3><code>copyOfNumbers</code> created</h3>";
$copyOfNumbers = $numbers;
xdebug_debug_zval('numbers');
xdebug_debug_zval('refToSecondElement');
xdebug_debug_zval('copyOfNumbers');


echo "<hr><h3><code>copyOfNumbers[1]</code> changed</h3>";
$copyOfNumbers[1] = "two";
xdebug_debug_zval('numbers');
xdebug_debug_zval('refToSecondElement');
xdebug_debug_zval('copyOfNumbers');


echo "<hr><h3><code>copyOfNumbers[2]</code> changed</h3>";
$copyOfNumbers[2] = "three";
xdebug_debug_zval('numbers');
xdebug_debug_zval('refToSecondElement');
xdebug_debug_zval('copyOfNumbers');

The key elements here are:

  • there's an initial array, $numbers;
  • I make a reference to one element of it as $refToSecondElement;
  • I copy $numbers as $copyOfNumbers;
  • I change the value of the second element of $copyOfNumbers;
  • and also the third element of same.
  • Along the way I output some debug regarding each variable.
The output is fascinating:


numbers created

numbers:
(refcount=1, is_ref=0),
array (size=3)
  0 => (refcount=1, is_ref=0),string 'tahi' (length=4)
  1 => (refcount=1, is_ref=0),string 'rua' (length=3)
  2 => (refcount=1, is_ref=0),string 'toru' (length=4)

refToSecondElement created

numbers:
(refcount=1, is_ref=0),
array (size=3)
  0 => (refcount=1, is_ref=0),string 'tahi' (length=4)
  1 => (refcount=2, is_ref=1),string 'rua' (length=3)
  2 => (refcount=1, is_ref=0),string 'toru' (length=4)
refToSecondElement:
(refcount=2, is_ref=1),string 'rua' (length=3)

copyOfNumbers created

numbers:
(refcount=2, is_ref=0),
array (size=3)
  0 => (refcount=1, is_ref=0),string 'tahi' (length=4)
  1 => (refcount=2, is_ref=1),string 'rua' (length=3)
  2 => (refcount=1, is_ref=0),string 'toru' (length=4)
refToSecondElement:
(refcount=2, is_ref=1),string 'rua' (length=3)
copyOfNumbers:
(refcount=2, is_ref=0),
array (size=3)
  0 => (refcount=1, is_ref=0),string 'tahi' (length=4)
  1 => (refcount=2, is_ref=1),string 'rua' (length=3)
  2 => (refcount=1, is_ref=0),string 'toru' (length=4)

copyOfNumbers[1] changed

numbers:
(refcount=1, is_ref=0),
array (size=3)
  0 => (refcount=2, is_ref=0),string 'tahi' (length=4)
  1 => (refcount=3, is_ref=1),string 'two' (length=3)
  2 => (refcount=2, is_ref=0),string 'toru' (length=4)
refToSecondElement:
(refcount=3, is_ref=1),string 'two' (length=3)
copyOfNumbers:
(refcount=1, is_ref=0),
array (size=3)
  0 => (refcount=2, is_ref=0),string 'tahi' (length=4)
  1 => (refcount=3, is_ref=1),string 'two' (length=3)
  2 => (refcount=2, is_ref=0),string 'toru' (length=4)

copyOfNumbers[2] changed

numbers:
(refcount=1, is_ref=0),
array (size=3)
  0 => (refcount=2, is_ref=0),string 'tahi' (length=4)
  1 => (refcount=3, is_ref=1),string 'two' (length=3)
  2 => (refcount=1, is_ref=0),string 'toru' (length=4)
refToSecondElement:
(refcount=3, is_ref=1),string 'two' (length=3)
copyOfNumbers:
(refcount=1, is_ref=0),
array (size=3)
  0 => (refcount=2, is_ref=0),string 'tahi' (length=4)
  1 => (refcount=3, is_ref=1),string 'two' (length=3)
  2 => (refcount=1, is_ref=0),string 'three' (length=5)


Observations:

  • each variable and array element has a refcount and and is_ref.
  • When $numbers is first created, it's refcount is 1 (itself), and its is_ref is false. Fair enough.
  • When $refToSecondElement is made, $numbers refcount and is_ref don't change, but note that its second element now has a refcount of two (itself and $refToSecondElement), and it not states is_ref=1. So not only is $refToSecondElement a reference, so is $numbers[1].
  • When $copyOfNumbers is made, initially it's just a reference. Note the refcount on both reflects this, but the is_ref does not. I guess this is because there's a difference between how PHP handles value copying internally is distinct from actively creating references. Not sure.
  • Also note that the refcount on $refToSecondElement (and $numbers[1] and $copyOfNumbers[1]) still reflect two references. As $copyOfNumbers is itself a reference at this point, $numbers[1] and $copyOfNumbers[1] are exactly the same thing. Not a new reference.
  • Now $copyOfNumbers[1] changes, so PHP has to actually make $copyOfNumbers a copy now, not just a reference back to $numbers. This is an example of copy-on-write, and is a performance optimisation. If all one is doing is reading a copied value, then leaving it as a reference is fine. It's not until the copied data changes that it needs to take on life of its own. And this is borne out by the fact that $numbers and $copyOfNumbers now have a refcount of one apiece.
  • On the other hand, the refcount on $numbers[1], $refToSecondElement and $copyOfNumbers[1] is now three, because $copyOfNumbers[1] is now a separate reference from $numbers[1] .
  • Also note that - as one would expect - the value change to $copyOfNumbers[1] is reflected in $referenceToSecondElement too. And indeed back to the reference in $numbers[1] too. This latter bit is the thing we didn't initially expect, but it kinda makes sense now.
  • Lastly we change $copyOfNumbers[2], and we see the refcount for it and $numbers[2] decrement, because again we are seeing copy-on-write: PHP has ceased using a reference to $numbers[2] for $copyOfNumbers[2], and it's now its own value.

XDebug came in handy here demonstrating what's going on, and being able to see how the values / references change as values are changed.

I feel just that slight bit less ignorant about how PHP works now. Nice one.

--
Adam