Wednesday 4 January 2017

PHP: looking at the Chain of Responsibility pattern

G'day:
We're working on a proof of concept for some stuff at work at the moment, and I've been put on a code review of the first round of the code. Part of the code is dealing with conditionally getting some data from cache, or if it ain't there, getting it from the DB instead: a fairly common trope. Normally I'd fall back on the decorator pattern for this, indeed I've written about it in the past ("Using a decorator pattern to reduce inappropriate code complexity"). I didn't see this in play in this first tranche of code, so suggested its use. The authoring dev suggested they felt that the decorator pattern was not a good fit for this, so they were using the Chain of Responsibility pattern instead. Another of our devs - when prodded - tended to agree.

I gave this some thought, read-up on the Chain of Responsibility pattern (and some more on the Decorator Pattern), and wasn't quite seeing it.

Just to back up a bit... if yer like me and kinda knew about the Chain of Responsibility pattern, but wasn't intimately familiar with it: read that Wikipedia link above, and do some googling of yer own. But in short it's characterised by having a task that needs doing, and having a sequence (a chain) of agents that possibly can (or possibly cannot) resolve all or part of the task. The process is that the task is passed to the chain of agents, and each in turn decides for itself whether it can fulfil the task in whole or in part. If an agent can resolve the whole task it does so: returning the result. If a given agent cannot fulfil the whole task (either it cannot fulfil it at all, or can only do part of it), it does its work, then passes responsibility on to the next task in the chain. Each agent knows two things: how to do its part of the job, and that it can pass work on to the "next" agent. It does not (and should not!) know what the next agent does, nor should it rely on what the next agent does. It simply does its job and returns it or passes on incomplete work to the next agent in the chain. I can't be arsed with a diagram: go google one.

Now: for two thirds of our challenge, I think the Chain of Responsibility pattern is a good fit: for the "Is it in the cache? Yes [/No. Is it in the DB? Yes/No]" bit. No qualms there. However it sticks in my craw once we get to the step after getting from the DB: putting it in the cache for next time. That step does not belong in the chain as it's not performing the same sort of task as the other two: getting the data. Second to that it's hitting the cache before and after the DB step, which makes it more seem like a decoration to me (of sorts... I might get back to the "of sorts..." bit later). Even if one decides "get from cache" is one agent's job, and "get from DB, oh and put it in the cache" is another agent's job, that latter agent is doing a mishmash of "unrelated" work, and seems to be a bit tightly-coupled to me, as well as having a wodge of code (that's a technical term) doing two things, rather than two wodges of code doing one thing each.

That said, I don't like being so emphatic about "that's wrong" without being happy about my understanding of things, so figured I should try to make it work in a way that sits well with me in a proof of concept. That and it seemed like good material for a blog article.

In my proof of concept I am fetching some Person data, for example:

$person = $personService->getById(1);

The PersonService defers to a "RetrievalHandler" for this:

class PersonService {

    private $retrievalHandler;

    public function __construct(PersonRetrievalHandler $retrievalHandler) {
        $this->retrievalHandler = $retrievalHandler;
    }

    public function getById($id) : Person {
        $record = $this->retrievalHandler->getById($id);

        if (is_array($record)){
            $person = new Person($record['firstName'], $record['lastName']);

            return $person;
        }
        return new Person();
    }
}


This just finds and returns a Person object, implement thus:

class Person {

    public $firstName;
    public $lastName;

    public function __construct($firstName=null, $lastName=null) {
        $this->firstName = $firstName;
        $this->lastName = $lastName;
    }
}

As noted, the service uses a the RetrievalHandler to retreive stuff. A handler is implementation of this:

abstract class PersonRetrievalHandler {

    protected $nextHandler;
    protected $logger;

    public function __construct() {
        $this->nextHandler = new class {
            function getById(){
                return null;
            }
        };
    }

    public abstract function getById(int $id) : array;

    public function setNextHandler(PersonRetrievalHandler $nextHandler) {
        $this->nextHandler = $nextHandler;
    }
}

All it's required to do here is know how to use getById, and also it can setNextHandler. I'll come back to the code in the constructor later. Don't worry about that for now.

I've got two implementations of this. One for getting the thing from a caching service:

class CachedPersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function getById(int $id) : array {
        if ($this->cacheService->exists($id)) {
            return $cachedPerson = $this->cacheService->get($id);
        }

        return $this->nextHandler->getById($id);
    }
}


And one for getting it from a database repository:

class DatabasePersonHandler extends PersonRetrievalHandler {

    private $repository;

    public function __construct(PersonRepository $repository) {
        parent::__construct();
        $this->repository = $repository;
    }

    public function getById($id) : array {
        $record = $this->repository->getById($id);
        if (count($record)) {
            return $record;
        }

        return $this->nextHandler->getById($id);
    }
}


Note this is very woolly and not production-sound code. It's just to demonstrate the concept.

Important to note in both these situations is the agents don't themselves do any of the work; they just try to get the work done by something else, and return it if done, or pass the task to the next agent if the current agent itself couldn't fulfil the requirement.

I've created a stub caching service and repository:

class CacheService {

    private $cache = [];
    private $logger;

    public function __construct(LoggingService $logger)
    {
        $this->logger = $logger;
    }

    public function get($key) {
        $this->logger->info("Cache hit for $key");
        return $this->cache[$key];
    }

    public function exists(string $key) : bool {
        $exists = array_key_exists($key, $this->cache);

        $this->logger->info(sprintf("Cache %s for %d", $exists ? 'hit' : 'miss', $key));

        return $exists;
    }

    public function put(string $key, $value) {
        $this->cache[$key] = $value;
    }
}


class PersonRepository {

    private $data = [
        [],
        ['firstName' => 'Tui', 'lastName' => 'Tahi'],
        ['firstName' => 'Rewi', 'lastName' => 'Rua'],
        ['firstName' => 'Tama', 'lastName' => 'Toru'],
        ['firstName' => 'Whina', 'lastName' => 'Wha']
    ];
    private $logger;

    public function __construct(LoggingService $logger)
    {
        $this->logger = $logger;
    }

    public function getById($id) {
        if (array_key_exists($id, $this->data)){
            $this->logger->info("Database hit for $id");
            return $this->data[$id];
        }
        $this->logger->info("Database miss for $id");
        return [];
    }
}


These also log activity in a very naïve way:

class LoggingService {
    public function info(string $message) {
        echo $message . PHP_EOL;
    }
}


All of this is stitched together in a factory, as there's a lot of moving parts:

class PersonServiceFactory {

    public static function getPersonService()
    {
        $loggingService = new LoggingService();
        $cacheService = new CacheService($loggingService);
        $personRepository = new PersonRepository($loggingService);

        $cachedHandler = new CachedPersonHandler($cacheService);
        $databaseHandler = new DatabasePersonHandler($personRepository);

        $cachedHandler->setNextHandler($databaseHandler);

        $personService = new PersonService($cachedHandler);

        return $personService;
    }

}


Note how the only knowledge each agent has of another is what a given agent's next handler might be. Speaking of next handlers. let's go back to that code in the PersonRetrievalHandler's constructor:

public function __construct() {
    $this->nextHandler = new class {
        function getById(){
            return null;
        }
    };
}


You'll remember each getById implementation will either fulfil the task and return the result, or it'll defer to the next agent in the chain. This is just a mechanism to make this "work" even if the last agent in the chain calls return $this->nextHandler->getById($id). The base class will take the call and just return null. I was semi-pleased to be able to use a class literal here (or anonymous class as PHP wants to call them). So this means I do not need to set the next handler on the $databaseHandler: it'll just use the inbuilt one if it cannot fulfil the requirement itself.

Right so we can now run this to see what happens. Remember the expectation is that the requested data will be fetched from the cache as a priority, but if not found: get it from the DB. Note that I am not doing anything here about putting the data into the cache if it wasn't there in the first place. I'll get to that. Anyway, here's some tests:

$personService = PersonServiceFactory::getPersonService();
$person = $personService->getById(1);
var_dump($person);
$person = $personService->getById(1);
var_dump($person);
$person = $personService->getById(4);
var_dump($person);
$person = $personService->getById(5);
var_dump($person);

Here we're:
  • getting the same person twice to start with: ostensibly this'd demonstrate a cache-miss then a cache-hit;
  • getting a different person from the first (back to a cache miss);
  • getting a person who is neither in the cache nor the DB.

Results:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTest.php
Cache miss for 1
Database hit for 1
object(me\adamcameron\cor\model\Person)#9 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 1
Database hit for 1
object(me\adamcameron\cor\model\Person)#10 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 4
Database hit for 4
object(me\adamcameron\cor\model\Person)#9 (2) {
  ["firstName"]=>
  string(5) "Whina"
  ["lastName"]=>
  string(3) "Wha"
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#10 (2) {
  ["firstName"]=>  NULL
  ["lastName"]=>  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

That's all good:
  • the cache never finds anything (although it does look),
  • so passes the job on to the DB. The DB will return the data if found,
  • otherwise it'll fall back to the default agent which just returns null.


So far: so good. But why did I omit the "putting it into the cache" step. Well... using this model, it simply won't work. The rule is that if an agent can resolve the work, then it returns it. Otherwise it defers to the next agent. I figured I could implement a "pass-thru" agent which just caches the result found by the DB agent: job done. But if the DB agent finds the record - read: "it can fulfil the task" - then it doesn't pass the task on to the next agent. It considers the task fulfilled, so just returns its result. So the only time the "put it in cache" agent will be called is if the DB agent doesn't find anything. Precisely the wrong time to cache something. That aside, the agent only passes on the inputs (the $id) to the next agent, so there's no mechanism here for an agent to receive the result from the previous agent even if the result was there to pass.

What we'd need to do here is just use the chain of responsibility to get the data, and perhaps leave it to the PersonService to cache the result it receives. However this is a bit unbalanced in my view, as caching considerations of the same data are being handled at not only two different places in the code, but two different "levels" of the task at hand. That's a bit clumsy in my book.

Next I thought I could adjust the DB handler to unconditionally pass its result on to the next handler, rather than break out of the chain once done. But that's a bit crap as it makes the DB handler behave differently than the other handlers in the chain, which also implies it knows more about what's going on than it should: ie that there's a reason to always pass the result on, rather than return it if it's done the work. Not cool.

Update:

My colleague (a different one) suggested a third open: roll the cache-put into the CachedPersonHandler:

class CachedPersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function getById(int $id) : array {
        if ($this->cacheService->exists($id)) {
            return $cachedPerson = $this->cacheService->get($id);
        }

        $person = $this->nextHandler->getById($id);
        $this->cacheService->put($id, $person);
        return $person;
    }
}

Well: indeed. Now it's basically a decorator: you've decorated a DB call with some caching. Contrast it with this from my earlier article on the Decorator Pattern:

class CachedUserRepository implements UserRepositoryInterface {

    private $repository;
    private $cacheService;

    public function __construct(UserRepositoryInterface $repository, CacheServiceInterface $cacheService) {
        $this->repository = $repository;
        $this->cacheService = $cacheService;
    }

    public function getById($id) {
        if ($this->cacheService->isCached($id)) {
            return $this->cacheService->get($id);
        }
        $object = $this->repository->getById($id);

        $this->cacheService->put($id, $object);
        return $object;
    }

}

You've arrived at the same conclusion I started with ;-)

I did try one other tactic. Instead of passing just the inputs to the agents, I had a look at passing the inputs and the outputs through the chain, and also always passing on to the next agent, irrespective of whether a given handler arrived at the result. This was very proof-of-concept-y. It's possibly a legit approach if the Chain of Responsibility was one in that a given agent was only supposed to fulfil part of the solution, relying on all links in the chain to arrive at the complete result. This is not what I'm doing here, but decided to leverage this approach to get the caching working "properly".


Intermission

Go have a pee or get some more popcorn or something.


Now I have an interface for a Handler:

interface Handler {

    public function perform($request, $response);

}


This just reflects a more generic solution than getById. More importantly note that unlike getById, perform takes a request and a response. For our examples the "request" would be the ID. And the response would be any result a given agent was able to come up with.

This is implemented in an abstract way by PersonRetrievalHandler:

abstract class PersonRetrievalHandler implements Handler {

    protected $nextHandler;
    protected $logger;

    public function __construct() {
        $this->nextHandler = $this->getPassThroughHandler();
    }

    public function setNextHandler(PersonRetrievalHandler $nextHandler) {
        $this->nextHandler = $nextHandler;
    }

    private function getPassThroughHandler(){
        return new class  {
            function perform($_, $response){
                return $response;
            }
        };
    }
}

It's still down to the individual handlers to implement the perform method though.

I've also refactored that class literal out into a helper method; and its perform method now returns whatever response it was passed, rather than returning null.

The perform method in CachedPersonHandler is thus:

public function perform($request, $response=null) {
    $id = $request;
    if ($this->cacheService->exists($id)) {
        $response = $this->cacheService->get($id);
    }

    return $this->nextHandler->perform($request, $response);
}


Note that it doesn't return the result itself, it just passes it on to the next agent. Otherwise it's doing the same thing.

And the DatabasePersonHandler has also been slightly refactored:

public function perform($request, $response=null) {
    if (is_null($response)){
        $id = $request;
        $response = $this->repository->getById($id);
    }

    return $this->nextHandler->perform($request, $response);
}


It infers that the cache didn't find anything by it not receiving anything in the response, and only if so does it hit the DB. Otherwise it just passes everything through to the next agent.

Now we have a third handler, CacheablePersonHandler:

class CacheablePersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function perform($request, $response=null) {
        if (!is_null($response)) {
            $id = $request;
            $this->cacheService->put($id, $response);
        }

        return $this->nextHandler->perform($request, $response);
    }
}


This one makes no pretence about finding data, it just takes any received response and caches it, keyed on the request. And passes the response on. At this point remember the default handler will just return whatever response it receives.

We wire all this together in the factory again:

public static function getPersonService()
{
    $loggingService = new LoggingService();
    $cacheService = new CacheService($loggingService);
    $personRepository = new PersonRepository($loggingService);

    $cachedHandler = new CachedPersonHandler($cacheService);
    $databaseHandler = new DatabasePersonHandler($personRepository);
    $cacheableHandler = new CacheablePersonHandler($cacheService);

    $databaseHandler->setNextHandler($cacheableHandler);
    $cachedHandler->setNextHandler($databaseHandler);

    $personService = new PersonService($cachedHandler);

    return $personService;
}


This differs only in that it also creates the CacheablePersonHandler too, and stick it at the end of the chain.

Here's our revised test:

$personService = PersonServiceFactory::getPersonService();

foreach ([1,1,4,5,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


(it's the same, just without the tautological repeated duplication of the same stuff again).

And the result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTest.php
Cache miss for 1
Database hit for 1
Cache put for 1
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache hit for 1
Cache put for 1
object(me\adamcameron\cor\model\Person)#11 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 4
Database hit for 4
Cache put for 4
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(5) "Whina"
  ["lastName"]=>
  string(3) "Wha"
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#11 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

So it "works", in that DB hits are now going into the cache.

But it also re-caches cache hits too :-(

Sigh. I could work around this by doing an exists call in the latter handler too:

public function perform($request, $response=null) {
    if (!is_null($response)) {
        $id = $request;
        if (!$this->cacheService->exists($id)) {
            $this->cacheService->put($id, $response);
        }
    }

    return $this->nextHandler->perform($request, $response);
}


But it all seems a bit clumsy to me. And when something seems clumsy to me, I start wondering if this square peg is supposed to be going in this round hole. And my conclusion is generally: "no".

The problem here goes back to the "cache the data" part of this chain doesn't belong in the chain. It's not a congruent task compared to the data-fetch tasks. The "responsibility" in the "Chain of Responsibility" is that the responsibility is "fulfilling the task". The caching agent makes no attempt to fulfil the requirement of the chain, so it doesn't belong in the chain to me.

So where does it belong? Well it belongs as near as possible to when we know the database was hit, and we have the result. So we could put it in the database agent. But then we have this imbalance that the cache-look-up is part of the chain, and the cache-put is within part of the chain. I don't like that asymmetry. Also it means the DB agent is doing two things: getting data and caching it, which seems like it's doing too much to me: it needs to know how to do two different things. We could go down the route of having like a CachedPersonRepository, and that at least removes those two responsibilities from the agent, but it does just move them: now the repo needs to know about the DB and the caching.

What I'd end up doing is using a caching decorator around a DB-only repository. That's symmetrical, and all concerns are separated.

Hmmm.

Second Intermission

Go find a suitable blade so you can open a vein and end it all instead of having to read on… (and on…)


Oh, as a footnote to this another requirement I wanted to prove is that the chain links behaved "sensibly" if called without their adjacent agents. IE: the DatabasePersonAgent agent didn't need the CachedPersonAgent, nor did it need the CacheablePersonAgent.

Here's some examples:

Just the database handler:

$loggingService = new LoggingService();
$personRepository = new PersonRepository($loggingService);

$databaseHandler = new DatabasePersonHandler($personRepository);

$personService = new PersonService($databaseHandler);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustDatabaseHandler.php
Database hit for 1
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Database miss for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

Yup: it's getting data from the DB, and doesn't require the other agents.

Now one with just the CachedPersonHandler:

$loggingService = new LoggingService();
$cachingService = new CacheService($loggingService);

$cachedHandler = new CachedPersonHandler($cachingService);

$personService = new PersonService($cachedHandler);

$cachingService->put(5, ['firstName'=>'Ria', 'lastName'=>'Rima']);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}

Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustCachedHandler.php
Cache put for 5
Cache miss for 1
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
Cache hit for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(3) "Ria"
  ["lastName"]=>
  string(4) "Rima"
}

C:\src\php\php.local\src\chainOfResponsibility\public>

Here I'm priming the cache manually, just to make sure it'll find the cached item OK. And it does.

For the sake of completeness I also test with just the cache-put agent. It won't do anything useless (a warning flag in itself), but it should "work":

$loggingService = new LoggingService();
$cachingService = new CacheService($loggingService);

$cacheableHandler = new CacheablePersonHandler($cachingService);

$personService = new PersonService($cacheableHandler);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustCacheableHandler.php
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

This is all unhelpful to the stated goal, but it's doing what it's told and is self-contained. "Win".

In conclusion I think this has helped me get a handle on the Chain of Responsibility pattern, which is good. I still want to have a look at chains wherein the agents achieve partial fulfilment, and possibly some recursion as well. That'd be cool.

But I really don't think this is the right solution to the job at hand at work.

I'll leave you be now. Oh... as per usual: all the code is on Github, including the working application. The first iteration of the work is tagged separately.

Righto.

--
Adam