Monday 18 July 2016

REST & nouns & verbs: analysing the right problem

G'day:
I've recently changed teams at work, and have shifted from "safe" code bases that I've either been working on daily for over a year, or new applications that I've had a hand in from the ground up. Now I'm doing maintenance and adding features to existing code-bases written by different teams for different requirements, developed to different coding guidelines. So that's all a bit of a rude awakening that I am still settling into. I'm definitely out of my comfort-zone, and having to ask different questions than I usually might. That said: it's a good move and I kinda like that sort of thing, so cool!

For better of for worse, we use a lot of REST web services even for our internal processes. I personally believe this is architecturally questionable, but as no-one thought to ask me the question, I didn't get to partake in that particular discussion. To be fair to the situation, it was a decision made before I was part of the department; and - equally - I'm not entirely convinced of my position that it's a daft idea. Just mostly convinced ;-) Anyway, as the over-used truism declares: we are where we are.

Anyway, I find myself doing some maintenance work on one of our web services. This particular service handles the back-end processing for our web applications which make image file uploads. As part of the file upload we need to do a few things:


  • upload the file and stick it in the appropriate place;
  • store some user-provided data about the file (name, description, etc);
  • create a few different resizings of the image for various purposes;
  • contrive some metadata on all the file variations based on [stuff].
  • distribute all the files across our CDN.


The first two of those steps are fast and can be done in real time. The file operations (resizing and distribution) take some time, and obviously the metadata extraction cannot be peformed until the files actually exist.

We have these split into two processes: one that always occurs immediately that uploads the master file and stores it and its user-enter data; then a second process which handles all the slow stuff.  This is just so we can provide fast feedback to the UI. Note that all this is still an atomic operation: both processes need to run.

On the first iteration of the implementation of this the two processes were fired off by the UI. Basically the upload form was submitted and the main request would handle the upload and the data-write, and then a second request was fired off by AJAX. This is not a robust solution as it isn't atomic. It's possible that the first call can be made but the second one doesn't for some reason. From an architectural point of view it's just not up to the view layer to make this call either; the versioning and metadata processing is an adjunct to the main file-handling process on the server: it's more the second half of the same process, not a separate process in its own right. Realistically the two are only separated at all because of performance considerations. From the perspective of the consumer of the web service: it should be one call to the API.

This was driven home to us by a second web app needing to do the same operations with the same web service, and the bods on that team knew about the first call but not the second.

So my task was to remediate this.

The solution we adopted was to have the front-end web app only call the one process, POSTing to http://api.example.com/file, passing along the file-upload form information as the POST body: this is what the UI-driven process was already doing. I simply augmented that to put an async job into our queuing system. The queue job receives the ID of the created file, and then it simply makes a second REST call, also POSTing to another end point: http://api.example.com/process. All good.

A first observation here is that "process" really isn't a great name for a REST end point. An end point should have meaning unto itself, and "process" is meaningless unless one knows it's tightly coupled to an initial end point relating to a /file: "Ah: so it's processing the file". One shouldn't need this sort of context when it comes to a web service end point. Sadly this end point is now out in the wild. Well: here "the wild" is just internal apps, but we can't simply change the URL to be something better without scheduling updates to those apps too. I'll be back-logging work to get that sorted out.

Part of my work here was to boyscout-rule the controllers of the "process" end point to move business logic out of the controller (yeah, don't ask), and into a service tier, which called a repository tier etc.

Some of the business logic I moved was to extract the decision as to which request parameter to use for the ID. Apparently we have some legacy code somewhere which passes the file ID as another argument name - I do not have the code in front of me right now so I can't remember what it is - so there was some logic to determine whether to use the legacy argument or the new one. Arguably this is perhaps still the job of the controller - it's dealing with stuff from the request object, which I can sell to myself as being controlller logic - but we're pretty stringent that we have as little logic in the controller as possible. And as I already needed a service method to do "Other Stuff", so I pushed this back into helper in the service too: something like extractDistributionCriteria(array $allCriteria). I don't want to be passing the entire request object into the service, so I extracted the POST params and passed those, using Silex's $request->request->all() method, and that returns the POST params as an array. Other than that, I was just shuffling logic around, so everything should be grand.



When I ran the functional tests I started getting an error: the ID was not showing up. Indeed I found it was never making its way into the extractDistributionCriteria method. I checked back in the controller, and... lo... there where no POST params at all. Huh? I racked my brain entirely too long before I paid attention to the URL that was being POSTed: http://api.example.com/process/?id=1234 (or as mentioned, possibly http://api.example.com/process/?someLegacyArg=1234). Hang on... were we doing a GET? No... we were doing a POST. And fair enough - this was creating records. So shouldn't we be passing the arg as a POST param then? Well: as my boss pointed out out... it's not part of the payload being created, it's identifying which file is being processed, so from a REST perspective it is part of the address to the resource, so it belongs in the URL. We did kinda think it ought to be embedded in the URL path, eg: http://api.example.com/process/1234/, but that was by the by.

I revised my code to use GET params instead: $request-request->query->all(), and the test started to pass and I was happy. Until I ran all the functional tests and another one started failing. Sigh. WtaF? Groan. Sometimes we were passing the ID on the URL: sometimes we were passing it in the body. Seriously: this is pretty poor form IMO. Either it's part of the address or it's part of the payload... it can't be both / either. I also learned a lesson about Silex here. The original code I was remodelling used a Silex helper method: $this->request>get('id'), and this method will get the param from any of three places: a named route segment, eg: /process/{id}, or a POST param to a GET param. I think get() is an unfortunate name for this method, as it's a bit close to the significance of "GET" as an HTTP method in this context. And indeed I ass-u-me`d that that is all it did. It's good to learn stuff.

I fixed the code and now it's working as far as I can tell.

But this whole "doing a POST with the only data being in the URL" did not sit well with me. And the boss was suddenly not convinced either. We focused on the verb being used, and POST seemed right: this thing was creating records, it just contrived all the values for the records based on the address. But in that it created records it could not be a GET (and HEAD, OPTIONS, DELETE didn't make sense). One crease in this was that the method was idempotent though. It always created the same records, for every call. An "upsert" if you will. POSTs usually are not idempotent. I wondered about PUT, but that didn't seem right either. For one thing we weren't putting anything anywhere. We then concluded that it was kinda outside the remit of my work as the thing was already being used as a POST anyhow, so we needed to continue servicing it as a POST. Fair enough, but I was not done thinking about it.

That was on Friday afternoon, and I did not really seriously revisit it until this morning, when I had an archimedean "eureka" moment (and whilst not in the bath as Archimedes was according to the legend, I was indeed in the shower at the time. I'll leave you to linger on that imagery for a bit... not too bloody long though, blimey).

We were looking at the wrong part of the request. It's not the method that was the problem, it was the URL: "process". As I mentioned above it's a bad name as it means nothing, and - this is where I went a bit wrong - it relies being tightly-coupled to the /file call for it to make sense. But it's nothing to do with the file, either. Not in an addressing sense, anyhow. Remember what /process does? It:


  • create a few different resizings of the image for various purposes;
  • contrive some metadata on all the file variations based on [stuff].
  • distribute all the files across our CDN.


This is what the thing takes the file ID as a parameter: but it's not the ID of the operation, it's a foreign key (to mix metaphors slightly) to a different operation. We're not monkeying with the file here, we're monkeying with a different entity: file versions. So what we're really doing here is this:

POST to /file
This takes the body of the POST and creates the file.

POST to /fileversion
This takes the body of the POST (which is just a file ID), and creates - I guess - a FileVersions collection? So maybe the end point should be plural, too: /fileversions? Either way, what it should not be is "/process".

I've kinda glossed over the file-distribution part of this process, but it's occurred to me that that is just a vagary of how the files are stored, so is really a job for the repository layer, not the business logic layer, so is largely irrelevant to this exercise (except I think we have to refactor some code at some point).

I guess the thing we need to remember here is that an end point should identify the type of thing being processed: so a "noun" sort of thing: a file, a user, a file version (or a file version collection), etc. And the action being taken on the thing is imparted by the HTTP method. This has a rough analogy in class / method naming I guess? We would not have a service class called "process" (well: in this case we actually do, but that's a story for another day), we'd have a class called "FileVersion" and a method "create"; or maybe a class "File" and a method "createFileVersions" or something.

And the thing I need to take away from this is I should not have gone "hmmm... 'process'... that's not a cool name for an end point", but then basically ignore than and go look at something else. I need to actually pay attention to what I'm looking at, and think things through.

As for remedial action... I think we have some fundamental flaws in the architecture of this web service, and this bleeds through into its API. As the web service itself is fairly young, I think realistically we need to start thinking about retiring the current API and re-planning a new one before the thing is used to ubiquitously. But that's a decision for people higher up the foodchain than me.

Righto.

--
Adam