Thursday, 17 November 2016

PHP: decorating async GuzzleHttp calls

G'day:
A while back I looked at the decorator pattern in the context of some stuff we were doing:

That's all surprisingly straight forward and handy stuff.

We're revisiting some of this for a new implementation of our HTTP adapter library, which had everything baked into it before, and the decorating has had me scratching my head a bit as we're implementing an adapter for GuzzleHttp, and it does its requests asynchronously, and we want to leverage that.

Just to recap, previously we'd have this sort of thing:

namespace me\adamcameron\testApp;

use GuzzleHttp\Client;

class GuzzleAdapter {

    private $client;
    private $endPoint;

    public function __construct($endPoint){
        $this->endPoint = $endPoint;
        $this->client = new Client();
    }

    public function get($id){
        $response = $this->client->requestAsync(
            "get",
            $this->endPoint . $id
        );

        return $response;
    }

}

And say a logging decorator for that:

namespace me\adamcameron\testApp;

class LoggedGuzzleAdapter {

    private $adapter;
    private $logger;

    public function __construct($adapter, $logger) {
        $this->adapter = $adapter;
        $this->logger = $logger;
    }

    public function get($id){
        $this->logger->logMessage(sprintf("Requesting for %s", $id));

        $response = $this->adapter->get($id);

        $body = $response->getBody();
        $this->logger->logMessage(sprintf("Response for %s: %s", $id, $body));
        
        return $response;
    }

}


And we'd init our adapter thus:

$endPoint  = "http://cf2016.local:8516/cfml/misc/guzzleTestEndpoints/getById.cfm?id=";

$guzzleAdapter = new GuzzleAdapter($endPoint);
$logger = new LoggingService();
$adapter = new LoggedGuzzleAdapter($guzzleAdapter, $logger);


So the underlying GuzzleAdapter handles the Guzzle stuff, the LoggedGuzzleAdapter handles just the logging stuff, but defers to its GuzzleAdapter to do its part of the job, and keeps all the moving parts and bits of functionality sensibly separated.  And it's pretty simple. And as detailed in those earlier articles, we can keep layering decorators around an adapter to add caching or what-have-you in a similar way. Easy. Nice.

However this only works cos the call to Guzzle actually returns the result on the spot. And this is cos we were using it synchronously: we make a call to it, it blocks until it gets the answer back from the target and gives us the answer.

Now that we're using async calls, Guzzle doesn't give us the answer, it just gives us a Promise which will eventually resolve to be the answer. This is in theory good cos it means the calling code can make a bunch of HTTP calls, and not wait around for each of them to resolve in series: Guzzle will actually make them all in parallel. I have a look at this in article "PHP: async requests using Guzzle and request pools".

If we go back to our decorator we can see the problem:

public function get($id){
    $this->logger->logMessage(sprintf("Requesting for %s", $id));

    $response$promisedResponse = $this->adapter->get($id);

    $body = $response$promisedResponse->getBody();
    $this->logger->logMessage(sprintf("Response for %s: %s", $id, $body));
    
    return $response$promisedResponse;
}

At the point at which our decorator needs the body... we don't have it yet. All we have is a promise that at some point we'll have a body (or we'll have response object via the resolved promise, anyhow; and the response object will have the body).

To get the body we first need to wait for the promise to resolve... which is a blocking operation and kinda defeats the purpose of using the async approach in the first place. IE: we could do this sort of thing:

public function get($id){
    $this->logger->logMessage(sprintf("Requesting for %s", $id));

    $promisedResponse = $this->adapter->get($id);
    $response = $promisedResponse->wait();
    
    $body = $response->getBody();
    $this->logger->logMessage(sprintf("Response for %s: %s", $id, $body));
    
    return $response;
}

But doing the wait immediately defeats the purpose of making the call async in the first place. We want the wait call to... err... wait... until the calling code says "right I need to actually use that data now".

Our initial attempt to sort this out was to analyse the issue as being one of "well we don't have the data we need until we call wait, so then we need to do the logging then... which means we need to intercept the wait call... which means we need to return our own decorated version of the response from the call to the adapter with its own wait and have a LoggedResponse, and return that from the code above..." But... and I hope my colleague doesn't mind me saying... when I saw this code in code review I kinda went "there must be a better, more semantic way of doing this". I won't repeat the code as I cannot use our actual work code in my blog, and I am not in the office right now and can't remember the detail of the implementation anyhow. But it's not ideal, so I don't want to share it anyhow.

I've been off sick for the last coupla days, which gives me a lot of time to deliberate and mess around with stuff and google a lot more than I allow myself time for when I'm in the office. With a wee bit of "standing back and having another think about it", the solution became clear. We don't need to explicitly override the wait method before we call it... the promise builds that capability in! At the time we're making the call, we get to tell the Promise what happens when it gets resolved. So I've knocked together this proof of concept:

public function get($id){
    $this->logger->logMessage(sprintf("(%s) Requesting for %s", $this->thisFile, $id));

    $response = $this->adapter->get($id);

    $response->then(function($response) use ($id){
        $body = $response->getBody();
        $this->logger->logMessage(sprintf("(%s) Response for %s: %s", $this->thisFile, $id, $body));
        $body->rewind();
    });

    return $response;
}

This was a bit of a head-slap moment - in that it took me a while to work out what to do - the whole thing about Promises is that one's able to chain functionality onto their lifecycle. if we wanna run some code once the Promise resolves, we just slack a then handler onto it.

This is a good solution IMO as it keeps all the logging decoration in one place, doesn't require any blocking of the async part of the operation, and is still very clear what's going on, even if the inner workings of Promises are a bit opaque (which TBH, they still are for me. Every time I work with 'em I have to re-read the docs. Which in Guzzle's case are pretty bloody awful, if I'm to be unkindly frank).

I chucked all this into a test rig to demonstrate to myself it was sound.

First up I contrived an endpoint in CFML for the PHP code to hit. I wanted to do this outside of PHP so it didn't in any way interfere with the PHP code running. It's simple:

<cfscript>
cfcontent(type="application/json");

writeLog(file="testApp", text="[ID: #URL.id#] request received");

sleep(5000);

writeOutput(serializeJson({"id"=URL.id, "retrieved"=now().dateTimeFormat("HH:nn:ss.lll")}));
writeLog(file="testApp", text="[ID: #URL.id#] response returned");
</cfscript>


This just takes an ID parameter and waits 5sec then returns it in a JSON packet. I wait 5sec to make the async-ness of the PHP calls easier to see. I'm also logging some stuff here to compare when the requests got to CF compared to when they were sent by PHP.

On the PHP end of things, I've got this:

use \me\adamcameron\testApp\GuzzleAdapter;
use \me\adamcameron\testApp\LoggedGuzzleAdapter;
use \me\adamcameron\testApp\LoggingService;

require_once __DIR__ . "/../vendor/autoload.php";

$endPoint = "http://cf2016.local:8516/cfml/misc/guzzleTestEndpoints/getById.cfm?id=";

$guzzleAdapter = new GuzzleAdapter($endPoint);
$logger = new LoggingService();
$adapter = new LoggedGuzzleAdapter($guzzleAdapter, $logger);

$ids = ["001", "002", "003", "004"];

$thisFile = basename(__FILE__);

$logger->logMessage(sprintf("(%s) Making requests...", $thisFile));

$responses = [];
foreach ($ids as $id){
    $logger->logMessage(sprintf("(%s) Requesting for %s", $thisFile, $id));
    $responses[] = $adapter->get($id);
}
$logger->logMessage(sprintf("(%s) Requests made", $thisFile));

$logger->logMessage(sprintf("(%s) Getting bodies from requests...", $thisFile));
foreach ($responses as $response){
    $logger->logMessage(sprintf("(%s) before calling wait()", $thisFile));
    $body = $response->wait()->getBody()->getContents();
    $logger->logMessage(sprintf("(%s) Response Body: %s", $thisFile, $body));
}
$logger->logMessage("Done");

This does the following:
  • creates the GuzzleAdapter;
  • uses a LoggedGuzzleAdapter to decorate it with some logging (the LoggingService just wraps some Monolog stuff);
  • loops over four IDs;
  • and requests their bumpf;
  • we then loop over the responses we've accumulated;
  • wait for them to resolve (calling wait blocks until they are);
  • and outputs the result;
  • all the way along, writing to the log at key points.

The actual LoggedGuzzleAdapter I'm using here is:

class LoggedGuzzleAdapter {

    private $adapter;
    private $logger;
    private $thisFile;

    public function __construct($adapter, $logger) {
        $this->adapter = $adapter;
        $this->logger = $logger;
        $this->thisFile = basename(__FILE__);
    }

    public function get($id){
        $this->logger->logMessage(sprintf("(%s) Requesting for %s", $this->thisFile, $id));

        $response = $this->adapter->get($id);

        $response->then(function($response) use ($id){
            $body = $response->getBody();
            $this->logger->logMessage(sprintf("(%s) Response for %s: %s", $this->thisFile, $id, $body));
            $body->rewind();
        });

        return $response;
    }

}


And the underlying GuzzleAdapter:

use GuzzleHttp\Client;

class GuzzleAdapter {

    private $client;
    private $endPoint;

    public function __construct($endPoint){
        $this->endPoint = $endPoint;
        $this->client = new Client();
    }

    public function get($id){
        $response = $this->client->requestAsync(
            "get",
            $this->endPoint . $id,
            ["http_errors" => true]
        );

        return $response;
    }

    public function getHandlerStack()
    {
        return $this->client->getConfig('handler');
    }

}


The things to note here are:

  • I'm logging the file name the log entry was made from. This is just to make it clearer in the logs which code is doing what for a given entry.
  • Just note the complete separation of concerns between the two files. One does logging. One does... "Guzzling". And they each focus on the job at hand, and that's it.

Here's the result. This is what PHP logged. I've trimmed-out some repetition & clutter, but have not changed any context or ordering of what got logged:

[14:15:53](makeRequests.php) Making requests...
[14:15:53](LoggedGuzzleAdapter.php) Requesting for 001
[14:15:53](LoggedGuzzleAdapter.php) Requesting for 002
[14:15:53](LoggedGuzzleAdapter.php) Requesting for 003
[14:15:53](LoggedGuzzleAdapter.php) Requesting for 004
[14:15:53](makeRequests.php) Requests made
[14:15:53](makeRequests.php) Getting bodies from requests...
[14:15:53](makeRequests.php) before calling wait()
[14:15:58](LoggedGuzzleAdapter.php) Response for 003{"retrieved":"14:15:58.450","id":"003"}
[14:15:58](LoggedGuzzleAdapter.php) Response for 002{"retrieved":"14:15:58.450","id":"002"}
[14:15:58](LoggedGuzzleAdapter.php) Response for 004{"retrieved":"14:15:58.450","id":"004"}
[14:15:58](LoggedGuzzleAdapter.php) Response for 001{"retrieved":"14:15:58.450","id":"001"}
[14:15:58](makeRequests.php) Response Body {"retrieved":"14:15:58.450","id":"001"}
[14:15:58](makeRequests.php) before calling wait()
[14:15:58](makeRequests.php) Response Body {"retrieved":"14:15:58.450","id":"002"}
[14:15:58](makeRequests.php) before calling wait()
[14:15:58](makeRequests.php) Response Body {"retrieved":"14:15:58.450","id":"003"}
[14:15:58](makeRequests.php) before calling wait()
[14:15:58](makeRequests.php) Response Body {"retrieved":"14:15:58.450","id":"004"}

Things to note:

  • first and foremost: the PHP code is indeed making the HTTP requests asynchronously. Remember the CFML code pauses for 5sec for each response, which means if we do four requests that's a total of 20sec paused time. But the whole process for all four requests takes only 5sec (CF is intrinsically multi-threaded, so from its side of things, it'll process all four requests simultaneously).
  • PHP makes all four requests within the same first second of execution.
  • After 5sec, the responses start to come in.
  • The first wait call bears the brunt of this 5sec wait. Bear in mind this is a blocking operation so PHP is now waiting for the 5sec-long response from CF to come back. But after that, all the responses start to come in before their equivalent wait calls, so those promises are resolved before the code even calls wait for them.
  • most importantly: the correct logging is taking place at the correct time. So the code actually works as intended.
One interesting thing here. it doesn't seem like the act of making the request actually triggers the request. It's the act of waiting that causes the requests to be sent (albeit it: all of them).

I stuck a ten second wait in my PHP code, between making the requests and asking about their responses:

$logger->logMessage(sprintf("(%s) Requests made", $thisFile));

sleep(10);

$logger->logMessage(sprintf("(%s) Getting bodies from requests...", $thisFile));

And here's the log for that run:

[14:47:58] (makeRequests.php) Making requests...
[14:47:58] (LoggedGuzzleAdapter.php) Requesting for 001
[14:47:58] (LoggedGuzzleAdapter.php) Requesting for 002
[14:47:58] (LoggedGuzzleAdapter.php) Requesting for 003
[14:47:58] (LoggedGuzzleAdapter.php) Requesting for 004
[14:47:58] (makeRequests.php) Requests made
[14:48:08] (makeRequests.php) Getting bodies from requests...
[14:48:08] (makeRequests.php) before calling wait()
[14:48:13] (LoggedGuzzleAdapter.php) Response for 004: {"retrieved":"14:48:13.704","id":"004"}
[14:48:13] (LoggedGuzzleAdapter.php) Response for 001: {"retrieved":"14:48:13.719","id":"001"}
[14:48:13] (LoggedGuzzleAdapter.php) Response for 003: {"retrieved":"14:48:13.719","id":"003"}
[14:48:13] (LoggedGuzzleAdapter.php) Response for 002: {"retrieved":"14:48:13.704","id":"002"}
[14:48:13] (makeRequests.php) Response Bodies: {"retrieved":"14:48:13.719","id":"001"}
[14:48:13] (makeRequests.php) before calling wait()
[14:48:13] (makeRequests.php) Response Bodies: {"retrieved":"14:48:13.704","id":"002"}
[14:48:13] (makeRequests.php) before calling wait()
[14:48:13] (makeRequests.php) Response Bodies: {"retrieved":"14:48:13.719","id":"003"}
[14:48:13] (makeRequests.php) before calling wait()
[14:48:13] (makeRequests.php) Response Bodies: {"retrieved":"14:48:13.704","id":"004"}
[14:48:13] Done

Notes:

  • the requests are all made @ 58.
  • you can see the 10sec sleep here. If the requests had actually been made, then they'd've all come back in that 10sec window.
  • but we can tell from the values coming back from ColdFusion that the requests weren't actually even made until 08 (as the 5sec delay is indicated in 13 in the returned timestamp
  • so it's the first call to wait that actually triggers the requests to be sent.

I'm not too sure I like that. Guzzle should crack on with it, as soon as it's asked, surely? Not wait around for me to ask "are we there yet?" Still... I don't think this will adversely impact what we're doing very often. In reality we're making a single call and then waiting for it immediately, almost all the time (this is to the extent I have actually questioned why we are doing this work, and why we're arsing about with Guzzle when we could just be using curl. But... oh well).

So that's the end of my most recent adventure with Guzzle and Promises and async calls and the like. I'm pleased with the resulting code. Not so pleased with how Guzzle works.

I've another two articles that cover the next challenge I faced with this stuff still to come. I've got the code for the second one written... but the topic for the third one only came to mind whilst writing this. So perhaps tomorrow I'll document my adventures of howTF to coerce Guzzle into raising the exceptions I choose for it to raise when requests come back with 400-500 responses, not its generic ones; and after that, how to implement a caching decorator for all this stuff. I've been off work sick for yesterday and today, but will be back in the office tomorrow. So I suspect the second article won't be out until Saturday now. I've had enough of being awake today, and am heading back to bed now. it is 3pm, after all ;-)

Righto.

--
Adam