Wednesday 20 May 2015

Random PHP (7) bits: improvements to generators

G'day:
I was gonna have a look at output buffering in PHP today, but my attention has been hijacked by PHP 7. So instead I'm gonna have a look at some enhancements they've made to generators in 7. I'm actually not at work today, so there's a chance I'll be able to write both articles anyhow. We'll see.

Right, so generators. I had a look at these in PHP before, briefly:


And had a look at emulating them with CFML:


Before I start... the code I'm showing here I am running on a build dated May 19 2015 03:49:29 (that's from phpinfo()). I had an older version of 7 - dated March - on my other PC and this code did not run on it. Obviously this is all pre-release stuff, so the versions are a bit of a moving target. if this code doesn't run for you: check your version.

So anyway - just quickly - here's an example:

// basic.php
function getNumber(){
    foreach(['tahi', 'rua', 'toru', 'wha'] as $i){
        yield $i;
    }
}

$numbers = getNumber();

echo $numbers->current() . '<br>';
$numbers->next();

echo $numbers->current() . '<br>';
$numbers->next();

echo $numbers->current() . '<br>';
$numbers->next();

echo $numbers->current() . '<br>';
$numbers->next();

echo $numbers->current() . '<br>';
$numbers->next();

echo '<hr>';
foreach(getNumber() as $number){
    echo $number . '<br>';
}

Notes:
  • A generator is a function which returns a value via a yield statement, not a return statement. The conceit is that the next time the function is called, processing resumes from the statement after the yield, rather than the function actually being called again. In this case, processing continues with the next iteration of the loop.
  • Generators implement the Iterator interface, providing methods for traversing the generated sequence.
  • Here I demonstrate two usages:
  • using current() and next() to iterate through the generated sequence;
  • and also using a standard foreach() loop.

In both cases, the results are the same:

tahi
rua
toru
wha


tahi
rua
toru
wha


So that's generators in a nutshell. My example here is a contrived one... it's worth looking at those earlier articles I link to to get a better idea of how they work and what they can be used for. They come into their own when dealing with infinite sequences, as the next element of the sequence doesn't need to exist until it's asked for.

PHP 7 has added a new feature to generators: "Generator Delegation". In a nutshell (ed: lazy writing there... two nutshells in consecutive paras? Hmmm...) as well as being able to yield a value, one can also yield from another generator, at which point that new generator is delegated to provided the yielded values.

Let's look at a very contrived example of this in action:

// delegated.php
function getWords(){
    foreach (['tahi','rua', 'toru', 'wha', 'rima', 'ono', 'whitu', 'waru', 'iwa', 'tekau'] as $number){
        yield $number;
    }
    yield from getColours();
}

function getColours(){
    foreach (['whero','karaka', 'kowhai', 'kakariki', 'kikorangi', 'poropango', 'papura'] as $colour){
        yield $colour;
    }
}

foreach(getWords() as $word){
    echo "$word<br>";
}

This outputs:

tahi
rua
toru
wha
rima
ono
whitu
waru
iwa
tekau
whero
karaka
kowhai
kakariki
kikorangi
poropango
papura


So... what's going on?
  • the first ten calls to getWords() yields a number, each in turn;
  • once that loop is finished we delegate the generation to the getColours() generator, using yield from;
  • but from the calling code's perspective, getWords() is still the one yielding the words.
What's a practical application of this? Well it's perhaps not the sort of thing one would use every day, but one thing I thought of when trying to come up with examples is provisioning data from varying sources, but keeping this hidden from the calling code.

In the example I'll post below, there's three locations records could come from: preloaded ones, ones in online storage (say a DB), and one in archive. Let's say most requests for records only ever need the first n records, so n records are preloaded. The first generator returns those. If the code starts asking for more records than that, record-generation is delegated to the generator which gets the online (but not preloaded) records. However the online records are divided between ones that are online - in the live DB - and nearline: still in the DB, but in the archive. Perhaps they need to be unpacked or something, but fetching those are slower. So only once the online records are exhausted is processing delegated to getting the archived records. The calling code doesn't need to worry itself about this, it just knows how to ask for records. It should not care where and how they come from.

// records.php

const QUARTER_SEC = 250000;
const HALF_SEC = 500000;

$recordCount = $_GET['count'] ? $_GET['count'] : 10;

$range = range(1, $recordCount);

$records = getRecords();

$start = microtime(true);

foreach($range as $i){
    $records->next();
    $record = $records->current();
    $elapsed = round((microtime(true) - $start) * 1000);
    echo "$record fetched @ {$elapsed}ms <br>";
}


function getRecords(){
    $preloadedRecords = array_map(function($i){ return "Preloaded record #$i";}, range(0,10));
    foreach ($preloadedRecords as $record){
        yield $record;
    }
    yield from loadOnlineRecords();
}

function loadOnlineRecords(){
    $nearlineRecordCount = 10;
    foreach(range(1,$nearlineRecordCount) as $recordIndex){
        $recordValue = "Online record #$recordIndex";
        usleep(QUARTER_SEC);
        yield $recordValue;
    }
    yield from loadArchivedRecords();
}

function loadArchivedRecords(){
    $archivedRecordCount = 10;
    foreach(range(1,$archivedRecordCount) as $recordIndex){
        $recordValue = "Archive record #$recordIndex";
        usleep(HALF_SEC);
        yield $recordValue;
    }
    yield from loadArchivedRecords();
}

Here I've slowed processing down for the online records by 250ms, and the archive ones for 500ms, just to emulate the situation I'm talking about. The output - when asking for 30 separate records - is

Preloaded record #1 fetched @ 0ms
Preloaded record #2 fetched @ 0ms
Preloaded record #3 fetched @ 0ms
Preloaded record #4 fetched @ 0ms
Preloaded record #5 fetched @ 0ms
Preloaded record #6 fetched @ 0ms
Preloaded record #7 fetched @ 0ms
Preloaded record #8 fetched @ 0ms
Preloaded record #9 fetched @ 0ms
Preloaded record #10 fetched @ 0ms
Online record #1 fetched @ 251ms
Online record #2 fetched @ 501ms
Online record #3 fetched @ 751ms
Online record #4 fetched @ 1001ms
Online record #5 fetched @ 1251ms
Online record #6 fetched @ 1501ms
Online record #7 fetched @ 1751ms
Online record #8 fetched @ 2001ms
Online record #9 fetched @ 2251ms
Online record #10 fetched @ 2501ms
Archive record #1 fetched @ 3001ms
Archive record #2 fetched @ 3501ms
Archive record #3 fetched @ 4001ms
Archive record #4 fetched @ 4501ms
Archive record #5 fetched @ 5001ms
Archive record #6 fetched @ 5502ms
Archive record #7 fetched @ 6004ms
Archive record #8 fetched @ 6504ms
Archive record #9 fetched @ 7004ms
Archive record #10 fetched @ 7504ms


The conceit here is that given each record is yielded separately, for the common use-cases (where ten or fewer records are requested), total processing overhead is nil. It's only an incremental delay once one gets into the "online" records, and then a larger incremental delay for the "nearline" ones. but all this is still hidden from the calling code, as all it's doing is asking for records.

Another situation I thought of is similar to this, and echos back to some of my earlier generator examples. Say a prime number generator might generally only be used for the first hundred or so primes. So the generator could pre-resolve those 100 first primes, and just yield those. But when that pool is exhausted, then yield from another generator which actually calculates the primes. This is obviously a variation on the same theme as above. There are probably better usage scenarios than this, but the whole concept is quite new to me, so that's where my understanding is at.

Not a bad feature. I wish CFML would add this sort of stuff to the language rather than crap like <cfclient>. Languages should focus on language features, not implementation crap. I'm glad Lucee has not fallen into that mire. But I digress.

BTW, ECMAScript 6 has this stuff in it too, via function* and yield*. I knocked together the equivalent of the first example in JS (and this works fine on Chrome if you just run it in the console):

// delegated.js

function* getWords(){
    for (number of ['tahi','rua', 'toru', 'wha', 'rima', 'ono', 'whitu', 'waru', 'iwa', 'tekau']){
        yield number;
    }
    yield* getColours();
};

function* getColours(){
    for (colour of ['whero','karaka', 'kowhai', 'kakariki', 'kikorangi', 'poropango', 'papura']){
        yield colour;
    }
}

for (item of getWords()){
    console.log(item);
}

That's about it. There's another new feature on generators: the ability to return a value as well as yield one. I've not come up with a good example of wanting to do that yet, but will have a look once I go.

Righto.

--
Adam