Sunday 19 February 2023

PHP: looking at spatie/async some more

G'day:

I'm getting back to this spatie/async library today (see "PHP: looking at spatie/async" for the first part of this).

Previously I have just been passing a callback to the add function when adding a task to the pool:

$pool->add(function () use ($connection, $i, $startTime) {
    $result = $connection->executeQuery("CALL sleep_and_return(?)", [2]);
    return sprintf(
        "%d:%d:%d",
        $i,
        $result->fetchOne(),
        microtime(true) - $startTime
    );
});

This requires the task to be tightly coupled to the pool-handling, which is probably not what one wants to do. Instead, one can put the task logic into a Task class:

namespace adamcameron\php8\Task;

use adamcameron\php8\tests\Integration\Fixtures\Database as DB;
use Doctrine\DBAL\Connection;
use Spatie\Async\Task;

class SlowDbCallTask extends Task
{
    readonly private Connection $connection;

    public function __construct(
        readonly private int $i,
        readonly private float $startTime
    ) {
    }

    public function configure()
    {
        $this->connection = DB::getDbalConnection();
    }

    public function run()
    {
        $result = $this->connection->executeQuery("CALL sleep_and_return(?)", [2]);
        return sprintf(
            "%d:%d:%d",
            $this->i,
            $result->fetchOne(),
            microtime(true) - $this->startTime
        );
    }
}

So that's a bit nicer, especially when you see how the calling code in the test looks now:

$pool->add(new SlowDbCallTask($i, $startTime));

One thing that took me a while to straighten out in my head was the way the configure method works / needs to be used. Initially, my implementation of the SlowDbCallTask class was a bit literal, and my constructor was thus:

public function __construct(
    readonly private Connection $connection,
    readonly private int $i,
    readonly private float $startTime
) {
}

This errored-out when I ran the test:

Exception: Serialization of 'Closure' is not allowed
/var/www/vendor/spatie/async/src/Runtime/ParentRuntime.php:87
/var/www/vendor/spatie/async/src/Runtime/ParentRuntime.php:69
/var/www/vendor/spatie/async/src/Pool.php:140
/var/www/tests/Functional/SpatieAsync/TaskTest.php:20

I've run into this before: the object being used as the task handler needs to be serialised to get it into the PHP process that's running it, and not everything can be serialised. Less than ideal, but so be it. Then it occurred to me that I was being a div anyhow: I don't want to pass the Task's DB connection into it - there's no reason to - I can just initialise it in the Task's constructor, eg:

public function __construct(
    readonly private int $i,
    readonly private float $startTime
) {
    $this->connection = DB::getDbalConnection();
}

But same error, and same problem. When I handle this in the constructor, it's all being done in the calling code, so the same serialisation issue exists. I have to concede the penny did not drop without some googling, and I found this helpful comment on an issue on GitHub, which led me to this other comment:

It's important to remember that parallel processes are really separate processes. By using closures and a smart bootstrap, we're simulating the parent process as best as possible. There are however cases which won't work with that simple approach: not everything is serialisable.

That's where a Task comes in. Tasks allow more control over the bootstrap in the child process.

Start by taking a look at what the README says about tasks: https://github.com/spatie/async#working-with-tasks

The less objects and application things are passed to the child process, the better. […]

Whilst it doesn't directly say it, this made me click that that config method is for (I was previously going "not sure why I need that"); the docs - linked to above - are of the sort that state what thing are, but not why they are they way they are, so don't directly explain this either. Basically if there's anything the Task will need that won't be serialisable: sling it in the config method. Makes complete sense to me now.

If one looks at the base Task class, we can see how it works:

namespace Spatie\Async;

abstract class Task
{
    abstract public function configure();

    abstract public function run();

    public function __invoke()
    {
        $this->configure();

        return $this->run();
    }
}

There is a simplified way of implementing tasks too (not sure why this is needed: once one knows what's going on, it's not like the other approach is "complicated".

To prove I'm actually TDDing all this still, here's the test for this one:

/** @TestDox It supports a simplified version of a task */
public function testSimpleTask()
{
    $pool = Pool::create();

    $pool->add(new SimpleTask());

    $results = $pool->wait();

    $this->assertCount(1, $results);
    $this->assertEquals("G'day world from an async call", $results[0]);
}

And the implementation simply needs to be this:

class SimpleTask
{
    public function __invoke()
    {
        return "G'day world from an async call";
    }
}

(It's a bit mindless, fine. It's late).

As per the docs: provided one has an __invoke method: that's it.


Speaking of TDD: in the first part of this - despite not starting with the test code listing - I did TDD it. Although I took it as kind of a refactoring, so I duplicated the test for the inline-task-via-callback version, and changed the code to instantiate my SlowDbCallTask. I did not directly test the configure and run implementations (with their own tests I mean), because that literally is implementation detail. The feature here is "the slow DB call [blah blah, whatever the feature actually is]", and whether it's done by inline callback or a Task class is neither here nor there. And that test does actually exercise all the code in the task class anyhow, so: job done.

Interestingly, the test coverage report disagrees with me:

This is because the PHP process running that code is not the same one running the tests, I guess. Hrm. Maybe I should have separate tests. But: I am not of the disposition that I need to chase 100% LOC coverage … but (again) I discuss why ensuring 100% is something to aim for anyway in an article I wrote ages ago: "Yeah, you do want 100% test coverage" (which should then be measured against this other article on the topic: "Test coverage: it's not about lines of code"). I think in this case - as the class is so simple - I'd put a @codeCoverageIgnore annotation on it. If the task class ended up having a bunch of moving parts in it that became trickier to test via its calling context, then I might consider testing those discretely. "It depends" [etc].

OK. That's the end of that section of the docs (I have written more about it than there was actual documentation that said. Ha. So I'm gonna finish up here. I think I have another article or so to write on this stuff yet. Let's seem.

Righto.

--
Adam