Tuesday 13 January 2015

PHP: trying to get async HTTP requests working via GuzzleHttp

G'day:
Last week I started looking at a few frameworks on PHP: "PHP: messing around with Silex, Pimple & GuzzleHttp". In that article I set up a basic Silex-driven site, using Pimple for managing dependency injection, and GuzzleHttp for handling HTTP requests to a stub REST API I had knocked together in CFML.

That was - for all intents and purposes - using synchronous calls in GuzzleHttp. It was making them asynchronously, but then immediately blocking until they completed. One of my requirements is to use asynchronous calls, so the HTTP requests don't block my app. Well: as little as possible, anyhow. If I have three five-second HTTP requests to make, I'd rather fire and forget them for as long as possible until I need the data, at which point I'll wait for them to complete. TBH, for my actual purposes REST is the wrong answer, especially if we need to actually worry about its latency, but we seem to be stuck with that. We'd be in the position to install the app on our PHP servers themselves and integrate with them more tightly (it's our own app we're communicating with via REST), but we're not able to do this for reasons I cannot fathom. C'est la vie.

Anyway, I've build on my app before - well the general structure of it - expanding it out to be a "blog"...or at least a remote REST API which serves up an article, some reference links and some comments for a given article number.



The output is along these lines:

ID: 2
Date: January, 10 2015 00:00:00
Title: Another article
Body: A different article


References


Comments

  • ID: 2
    Date: January, 10 2015 00:00:00
    Author: Fleur
    Body: First comment on the second article
  • ID: 3
    Date: January, 11 2015 00:00:00
    Author: Tori
    Body: Second comment on the second article

(Fleur and Tori are my two eldest nieces)

Contrivedly, there are three REST API calls in that: one for the article, one for the references, one for the comments. In the real world one would not do that, I know.

The code for the article REST end point is:

// Article.cfc
component rest=true restPath="article" {

    remote struct function getById(required numeric id restargsource="path") httpmethod="get" restpath="{id}" produces="application/json" {
        var article = entityLoad("Article", id, true);
        sleep(5000);
        return {
            id        = article.id,
            date    = article.date,
            title    = article.title,
            body    = article.body
        };
    }

    // [...]
}

Note that I have a 5sec pause in there. The code for references and comments is much the same - both with their own 5sec pause too. I'll not repeat their code here.

On the PHP end of things, I've got an article service which talks to this:

<?php
// Article.php

namespace dac\guzzledemo\services;

class Article {

    protected $app;
    protected $guzzleClient;
    protected $loggerService;

    function __construct($articleFactory, $guzzleClient, $loggerService){
        $this->articleFactory    = $articleFactory;
        $this->guzzleClient        = $guzzleClient;
        $this->loggerService    = $loggerService;
    }

    function getArticle($id, &$article){
        $loggerService = $this->loggerService;
        $loggerService->getElapsed("services/Article getArticle(): start");

        $response = $this->guzzleClient->get(
            'http://cf11.local:8511/rest/blog/article/' . $id,
            ["future"=>true]
        );
        $loggerService->getElapsed("services/Article getArticle(): After get()");

        $response->then(function($response) use (&$article, $loggerService) {
            $loggerService->getElapsed("services/Article getArticle(): top of then()");
            $articleAsArray = $response->json();

            $articleFactory = $this->articleFactory;
            $article = $articleFactory(
                $articleAsArray["ID"],
                $articleAsArray["DATE"],
                $articleAsArray["TITLE"],
                $articleAsArray["BODY"]
            );

            return $response;
        });
        $loggerService->getElapsed("services/Article getArticle(): bottom");
        return $response;
    }


}

There's a bunch of boilerplate and logging in there which isn't so interesting. I've focused on the main "doing" bit of the code.

To make the HTTP request asynchronous, I pass the future config parameter as true. I did this last time too, but I stopped and waited for it immediately:

$response = $this->guzzleClient->get('http://cf11.local:8511/rest/api/person/' . $id,["future"=>true]);
$response->wait();

This time I'm using a promise to create an event handler, and just letting Guzzle go about its merry business whilst I crack on with the rest of the code.

I actually meant to tidy that code up a bit and not have the event handler inline like that, but as a separate (then testable!) function. Oops. Still: the theory is the same... the first callback passed to then() is used as a success/completion handler. I could pass a second one for error handling, but I'm not really testing that side of things here, so didn't bother.

I'll come back to this code, but first some context, with the controller:

<?php
// Article.php

namespace dac\guzzledemo\controllers;

class Article {

    protected $twig;
    protected $articleService;

    function __construct($twig, $articleService, $referenceService, $commentService, $loggerService){
        $this->twig                = $twig;
        $this->articleService    = $articleService;
        $this->referenceService    = $referenceService;
        $this->commentService    = $commentService;
        $this->loggerService    = $loggerService;
    }

    function getArticle($id){
        $this->loggerService->getElapsed("controllers/article getArticle(): start");
        $article = [];
        $articleResponse = $this->articleService->getArticle($id, $article);
        $this->loggerService->getElapsed("controllers/article getArticle(): after getArticles()");

        $references = [];
        $referencesResponse = $this->referenceService->getReferencesForArticle($id, $references);        
        $this->loggerService->getElapsed("controllers/article getArticle(): after getReferencesForArticle()");

        $comments = [];
        $commentsResponse = $this->commentService->getCommentsForArticle($id, $comments);        
        $this->loggerService->getElapsed("controllers/article getArticle(): after getCommentsForArticle()");


        $this->loggerService->getElapsed("controllers/article getArticle(): before sleep()");
        sleep(6);
        $this->loggerService->getElapsed("controllers/article getArticle(): after sleep()");

        $this->loggerService->getElapsed("controllers/article getArticle(): before first wait()");
        $articleResponse->wait();
        $this->loggerService->getElapsed("controllers/article getArticle(): after first wait()");
        $this->loggerService->getElapsed("controllers/article getArticle(): before second wait()");
        $referencesResponse->wait();
        $this->loggerService->getElapsed("controllers/article getArticle(): after second wait()");
        $this->loggerService->getElapsed("controllers/article getArticle(): before third wait()");
        $commentsResponse->wait();
        $this->loggerService->getElapsed("controllers/article getArticle(): after third wait()");

        return $this->twig->render('article.html.twig', array(
            'article' => $article,
            'references' => $references,
            'comments' => $comments
        ));
    }

}

The routing is configured to call getArticle() when one browses to /index.php/article/n (where n is an article number. In my case: 2).

Again there's a bunch of boilerplate & logging here which I won't go into.


  • We make calls to the article, reference and comment service, each of which kick off an aysnc request to get their data.
  • On the CFML end of things (the REST end points) each of those calls sleeps for five seconds (as per above).
  • We pause PHP processing for six seconds. If the async calls really were being mad asynchronously then that should be enough time for all three to be made, Total HTTP processing time should be five seconds plus some additional milliseconds overhead each, but that should all be running simultaneously, so should complete in six seconds.
  • I then wait for each request to finish. Which, by now, they should all be.
  • Having waited, the data is ready, so pass it to the Twig for output.

And to demonstrate this in action, here's the log:

controllers/article getArticle(): start: 0
services/Article getArticle(): start: 0
services/Article getArticle(): After get(): 0
services/Article getArticle(): bottom: 0
controllers/article getArticle(): after getArticles(): 0
services/Reference getReferencesForArticle(): start: 0
services/Reference getReferencesForArticle(): After get(): 0
services/Reference getReferencesForArticle(): bottom: 0
controllers/article getArticle(): after getReferencesForArticle(): 0
services/Comment getCommentsForArticle(): start: 0
services/Comment getCommentsForArticle(): After get(): 0
services/Comment getCommentsForArticle(): bottom: 0
controllers/article getArticle(): after getCommentsForArticle(): 0
controllers/article getArticle(): before sleep(): 0
controllers/article getArticle(): after sleep(): 6
controllers/article getArticle(): before first wait(): 6
services/Article getArticle(): top of then(): 6
services/Reference getReferencesForArticle(): top of then(): 6
services/Comment getCommentsForArticle(): top of then(): 6
controllers/article getArticle(): after first wait(): 6
controllers/article getArticle(): before second wait(): 6
controllers/article getArticle(): after second wait(): 6
controllers/article getArticle(): before third wait(): 6
controllers/article getArticle(): after third wait(): 6

The key bit here is that the total execution time is around about 6sec. Meaning all the three 5sec requests do indeed run asynchronously, and simultaneously. Nice one!

One thing that initially tripped me up was that I had been creating a new GuzzleClient object for each service's dependency:

// Dependencies.php

// [...]

$app["services.guzzle.client"] = function() {
    return new Client();
};

// [...]

This isn't correct, as the internals of Guzzle need to use the same RingPHP instance to be able to do the async stuff. This wasn't immediately clear to me from the docs, so was confusing at first. I needed to make it a singleton:

$app["services.guzzle.client"] = $app->share(function() {
    return new Client();
});

Easy.

OK, well that's all I had to say. I continue be impressed with Guzzle and Pimple. I like the future / promise approach they take with this stuff, as it's all just "familiar".

I'm not sure about my implementation needing to declare the $article / $references / $comments arrays in the controller then pass references to them into the services and into the completion handlers, but I couldn't think of a slicker way to go it off the top of my head. If anyone has a better idea: lemme know.

I've got another experiment to do with Guzzle: using Request Pools. That should be interesting.

Righto.

--
Adam