Wednesday, 14 January 2015

PHP: async requests using GuzzleHttp and request pools

G'day:
This'll round out my investigations into GuzzleHttp for the time being. Yesterday I looked at "PHP: trying to get async HTTP requests working via GuzzleHttp", and got that working with only a coupla hitches (all from me not RTFMing). That approached things by making individual async calls. What I'm doing today is to create an array of requests, and use GuzzleHttp's request pools to make them all simultaneously. It's taken about four hours to get the code together in a way I think makes sense. I had a coupla delays due to not paying attention to my code, then being bemused as to why it didn't work. But on the whole it was pretty smooth and predictable. And it just works.

I'm continuing on the same "blog" application I was using yesterday. To remind you, the resultant page I am delivering is thus:

ID: 2
Date: January, 10 2015 00:00:00
Title: Another article
Body: A different article


References

Comments

  • ID: 2
    Date: January, 10 2015 00:00:00
    Author: Fleur
    Body: First comment on the second article
  • ID: 3
    Date: January, 11 2015 00:00:00
    Author: Tori
    Body: Second comment on the second article


Contrivedly, there are three REST API calls in that: one for the article, one for the references, one for the comments. In the real world one would not do that, I know.

(and, yes, I copied and pasted that from yesterday's article).

Today I've got a new method in my controller:

<?php
// Article.php

namespace dac\guzzledemo\controllers;

class Article {

    // [...]

    function getArticleViaPool($id){
        $article = [];
        $references = [];
        $comments = [];

        $this->loggerService->logTaskTime("==============================", function() use ($id, &$article, &$references, &$comments) {
            $logSource = "controllers/article getArticleViaPool()";

            $requests = [];

            $this->loggerService->logTaskTime("$logSource: create requests", function() use (&$requests, $id, &$article, &$references, &$comments) {
                $requests[] = $this->articleService->getArticleRequest($id, $article);
                $requests[] = $this->referenceService->getReferencesRequest($id, $references);
                $requests[] = $this->commentService->getCommentsRequest($id, $comments);
            });

            $pool = null;
            $this->loggerService->logTaskTime("$logSource: run pooled requests", function() use (&$pool, &$requests, $id, &$article, &$references, &$comments) {
                $pool = new \GuzzleHttp\Pool($this->guzzleClient, $requests);
            });

            $this->loggerService->logTaskTime("$logSource: wait for pooled requests", function() use ($pool) {
                $pool->wait();    
            });
        });
        return $this->twig->render('article.html.twig', [
            'article' => $article,
            'references' => $references,
            'comments' => $comments
        ]);
    }
        
}

It looks like a lot of code, but if we focus on just the "doing" bits (and not the logging stuff, which is most of it), then there's not much to it:
  • use my three services to make a request object for each of the Article itself, its References and its Comments;
  • stick those in a Pool;
  • wait for the Pool to finish grabbing the results.
  • From there I do the same as yesterday, and pass the three resultant arrays into my view.


I could have written the controller thus:

function getArticleViaPool($id){
    $article = [];
    $references = [];
    $comments = [];

    $pool = new \GuzzleHttp\Pool($this->guzzleClient, [
        $this->articleService->getArticleRequest($id, $article),
        $this->referenceService->getReferencesRequest($id, $references),
        $this->commentService->getCommentsRequest($id, $comments)
    ]);
    $pool->wait();

    return $this->twig->render('article.html.twig', [
        'article' => $article,
        'references' => $references,
        'comments' => $comments
    ]);
}

That's pretty simple.

Most of the work - although there's still not much - is done in the services. Here's the relevant code in the Article service:

<?php
// Article.php

namespace dac\guzzledemo\services;

class Article {

    protected $guzzleClient;
    protected $loggerService;
    protected $articleEndPoint;

    function __construct($articleFactory, $guzzleClient, $loggerService, $articleEndPoint){
        $this->articleFactory    = $articleFactory;
        $this->guzzleClient        = $guzzleClient;
        $this->loggerService    = $loggerService;
        $this->articleEndPoint    = $articleEndPoint;
    }

    // [...]

    function getArticleRequest($id, &$article){
        $loggerService = $this->loggerService;
        $request = $this->guzzleClient->createRequest("get", $this->articleEndPoint . $id);
        $request->getEmitter()->on(
            "complete",
            function($e) use (&$article, $loggerService){
                $loggerService->logTaskTime("services/Article createComments()", function() use ($e, &$article) {
                    $this->createArticle($e->getResponse()->json(), $article);
                });
            }
        );
        return $request;
    }

    private function createArticle($articleAsArray, &$article){
        $articleFactory = $this->articleFactory;
        $article = $articleFactory(
            $articleAsArray["ID"],
            $articleAsArray["DATE"],
            $articleAsArray["TITLE"],
            $articleAsArray["BODY"]
        );
    }

}

You might recall that yesterday I had that createArticle() code inline, in the callback which I admitted wasn't great: it's untestable like that. Today I've factored it out. The callback does a minimum of inline work, just doing the "closure" then calling the separate method to do the work.

You'll also notice the URLs aren't hard-coded any more. That was a bit lazy of me. I've moved stuff like that into the dependency injection config:

<?php
// Dependencies.php

// [...]

class Dependencies {

    static function configure($app){

        $app["parameters.articleEndPoint"]        = "article/";
        // [...]
        $app["parameters.baseRestUrl"]            = "http://cf11.local:8511/rest/blog/";

        // [...]

        $app["services.article"] = $app->share(function($app) {
            return new services\Article(
                $app["factories.article"],
                $app["services.guzzle.client"],
                $app["services.logger"],
                $app["parameters.articleEndPoint"]
            );
        });        

        // [...]

        $app["services.guzzle.client"] = $app->share(function($app) {
            return new Client([
                "base_url" => $app["parameters.baseRestUrl"]
            ]);
        });
    }

}

So my hard-coded values are all config now. And also note that I found out the Client object can be initialised with the base_url it'll be using, and it'll automatically prepend that onto any relative URLs I use in my Request objects. Much tidier.

Anyhow, back to the service. What I am doing now is instead of performing the actual request like I was yesterday, I'm now simply creating the request object. I'm also binding a completion handler to its response which will create my native data for me once the request is run and completes:

$request = $this->guzzleClient->createRequest("get", $this->articleEndPoint . $id);
$request->getEmitter()->on(
    "complete",
    function($e) use (&$article){
        $this->createArticle($e->getResponse()->json(), $article);
    }
);
return $request;

Oh, and I return the request to the controller so it can then pool it.

And that's really it!

Oh, you'll notice perhaps I've been using a new logging function today:

public function logTaskTime($message, $job){
    error_log(sprintf("BEFORE: %s: %f", $message, microtime(true) - $this->start));
    $job();
    error_log(sprintf("AFTER: %s: %f", $message, microtime(true) - $this->start));
}

The difference is that this one uses microsecond-accuracy instead of seconds, but it handles some duplication I had in my code yesterday. Yesterday I was logging an entry before and after each section of the code. I've written this version so I wrap code in a function call, then run the wrapped code between the BEFORE and AFTER log entries. It just tidies things up a bit, and shows a good(?) use of using inline function expressions.

FWIW, the log for a typical run is:

BEFORE: ==============================: 0.002000
BEFORE: controllers/article getArticleViaPool(): create requests: 0.002000
AFTER: controllers/article getArticleViaPool(): create requests: 0.021001
BEFORE: controllers/article getArticleViaPool(): run pooled requests: 0.021001
AFTER: controllers/article getArticleViaPool(): run pooled requests: 0.029002
BEFORE: controllers/article getArticleViaPool(): wait for pooled requests: 0.029002
BEFORE: services/Article createArticle(): 1.122064
AFTER: services/Article createArticle(): 1.125064
BEFORE: services/Reference createReferences(): 1.131065
AFTER: services/Reference createReferences(): 1.132065
BEFORE: services/Comment createComments(): 1.150066
AFTER: services/Comment createComments(): 1.151066
AFTER: controllers/article getArticleViaPool(): wait for pooled requests: 1.151066
AFTER: ==============================: 1.151066


I've pegged back the actual REST code to only delay for a second (yesterday was five seconds), but this still demonstrates that all three requests take a little over one second in total to run, demonstrating GuzzleHttp is doing 'em simultaneously.

Right. So that's that then. I s'pose my next article better look at some Clojure, given I started that on the w/end...

--
Adam