Thursday 28 July 2016

PHP: I finally get around to seeing how PHP interacts with a DB

G'day:
This is "interesting" in that I've been a PHP dev for closing in on two years now, but as yet I've not had a requirement to connect to a DB and... like... query stuff. Crazy! This is all down to everything we consume coming from web services rather than a DB. We had other teams to do the DB stuff behind the web services. And now I find myself actually on one of those teams. So I better learn how to do my job.

A word of warning on this one. It's gonna be one of those "just me pootling about testing how stuff works", and not really offering much insight on things given this is my first "go" at any of this stuff. It's more a (delayed) stream of consciousness of my experimentation. Delayed cos I wrote the code last night, but am only getting a chance to write it up now.

OK, so first things first, one needs to enable a DB driver... and they're not enabled by default. This is in php.ini:

;extension=php_bz2.dll
extension=php_curl.dll
;extension=php_fileinfo.dll
;extension=php_gd2.dll
;extension=php_gettext.dll
;extension=php_gmp.dll
extension=php_intl.dll
;extension=php_imap.dll
;extension=php_interbase.dll
;extension=php_ldap.dll
extension=php_mbstring.dll
;extension=php_exif.dll      ; Must be after mbstring as it depends on it
;extension=php_mysqli.dll
;extension=php_oci8_12c.dll  ; Use with Oracle Database 12c Instant Client
extension=php_openssl.dll
;extension=php_pdo_firebird.dll
;extension=php_pdo_mysql.dll ; uncomment this
;extension=php_pdo_oci.dll
;extension=php_pdo_odbc.dll
;extension=php_pdo_pgsql.dll
;extension=php_pdo_sqlite.dll
;extension=php_pgsql.dll
;extension=php_shmop.dll



The second thing I needed to do was to give php.ini a fully-qualified path to my extensions directory:

; Directory in which the loadable extensions (modules) reside.
; http://php.net/extension-dir
; extension_dir = "./"
; On windows:
;extension_dir = "ext"
extension_dir = "C:\apps\php\7\ext"


Sometimes I need to do this on a PHP install on Windows; sometimes not. And it doesn't even seem to be uniform on a given machine! Last night I needed to give this machine the full path to the ext dir; I reversed this this evening whilst writing this, and it didn't give me problems just being "ext".

Anyway, if you get this error:

Fatal error: Uncaught PDOException: could not find driver

Then it's the driver not being enabled, or PHP not being able to find the extension.

Right. So some code.

First things first, to connect to a DB one needs a connection to do so. I've wrapped this up in a class for my purposes, to get it out of the way of the rest of my code:

class ConnectionFactory {

    private static $host = 'localhost';
    private static $port = '3306';
    private static $dbName = 'scratch';

    public static function createConnection() {
        $connectionString = sprintf('mysql:host=%s;port=%s;dbname=%s', self::$host, self::$port, self::$dbName);
        $dbConnection = new \PDO($connectionString, Credentials::$login, Credentials::$password);

        return $dbConnection;
    }
}

And to get my connection:

$dbConnection = ConnectionFactory::createConnection();

Oh yeah, my credentials are hidden away in another class still:

class Credentials {

    public static $login = 'scratch';
    public static $password = 'scratch';
}

I def don't want to be sharing that secret information around the place.

So the connection could easily be created with one line of code, inline:

$dbConnection = new \PDO("mysql:host=localhost;port=3306;dbname=scratch", "scratch", "scratch");

Phew. One way or another we got there.

OK, so to run a query from there is pretty easy:

$numbers = $dbConnection->query('SELECT * FROM numbers');

And from there I can do stuff with my numbers:

foreach ($numbers as $row) {
    printf('ID: %s: English: %s, Maori: %s%s', $row['id'], $row['en'], $row['mi'], PHP_EOL);
}

This outputs:

ID: 1: English: one, Maori: tahi
ID: 2: English: two, Maori: rua
ID: 3: English: three, Maori: toru
ID: 4: English: four, Maori: wha
ID: 5: English: five, Maori: rima
ID: 6: English: six, Maori: ono
ID: 7: English: seven, Maori: whitu
ID: 8: English: eight, Maori: waru
ID: 9: English: nine, Maori: iwa
ID: 10: English: ten, Maori: tekau

It doesn't really get much easier than that.

And I'm sorry to my old CFMLer readers who still hang on to <cfquery> being a thing... no. Creating a connection and just running a query in two statements is easier than horsing around with admin config and tags and shit like that. And the code is nicer (a rare occasion of me saying something complimentary about PHP's "mise-en-scène").

That's a very overly simplistic example though. What about doing some filtering via a parameter in a WHERE clause?

The query method seems to just be very quick way of passing a static string to the DB and having it processed and returned. That's about it. To pass params, we need to do slightly more work. We need to prepare a statement and execute it:

$preparedStatement = $dbConnection->prepare('SELECT * FROM numbers WHERE id <= :upperThreshold');
$preparedStatement->execute(['upperThreshold' => $argv[1]]);

$numbers = $preparedStatement->fetchAll();

(note that one doesn't need to fetchall: one can just get the result row-at-a-time with fetch, but that's something for another day).

And this runs thus, when I give it a "4" (using the same "view" code as I did last time):


>php C:\src\preparedStatement.php 4
ID: 1: English: one, Maori: tahi
ID: 2: English: two, Maori: rua
ID: 3: English: three, Maori: toru
ID: 4: English: four, Maori: wha

Process finished with exit code 0

That was easy.

Aside:

I thought passing data values in the SQL statement was a failing of CFML developers. I mean this sort of thing:

<cfquery>
    SELECT *
    FROM table
    WHERE someColumn = '#dataValue#'
</cfquery>

Unfortunately I am seeing this shit in PHP code too. The devs are saying "but we sanitise the value first so there's no SQLi risk". Completely missing the point that the SQL statement and its parameters are supposed to be kept separate so the statement can be compiled.

Sigh.


As well as just passing the parameter values to the execute call, one can also bind values or variables to a parameter. Binding a value isn't that interesting:

$preparedStatement = $dbConnection->prepare('SELECT * FROM numbers WHERE id <= :upperThreshold');
$preparedStatement->bindValue(':upperThreshold', $upperThreshold, PDO::PARAM_INT);
$preparedStatement->execute();

The added bonus here is one can enforce a type on the value being bound too.

Binding a variable is more interesting. Check this out:
$upperThreshold = 1;

$dbConnection = ConnectionFactory::createConnection();
$preparedStatement = $dbConnection->prepare('SELECT * FROM numbers WHERE id <= :upperThreshold');
$preparedStatement->bindParam(':upperThreshold', $upperThreshold, PDO::PARAM_INT);

$upperThreshold = 10;

$preparedStatement->execute();
$numbers = $preparedStatement->fetchAll();

Here I'm binding the variable $upperThreshold to the param. But it's binding a reference so by the time we run this query, the bound value is 10, not the 1 that it was when the bind was first made.

It occurs to me now (as I type) that I've not tested binding a not-yet-declared variable. Let's try a variation:
$preparedStatement = $dbConnection->prepare('SELECT * FROM numbers WHERE id <= :upperThreshold');
$preparedStatement->bindParam(':upperThreshold', $upperThreshold, PDO::PARAM_INT);

$upperThreshold = 4;

$preparedStatement->execute();

Here I bind the $upperThreshold variable before I even declare it. And this all works fine!

>php.exe C:\src\preparedStatementBindParamWithNoVariable.php
ID: 1: English: one, Maori: tahi
ID: 2: English: two, Maori: rua
ID: 3: English: three, Maori: toru
ID: 4: English: four, Maori: wha

Process finished with exit code 0

So that's quite cool. Kinda.

The next thing I looked at was a simple transaction proof of concept:

$paramArray = ['id' => $argv[1]];

$dbConnection = ConnectionFactory::createConnection();

$dbConnection->beginTransaction();
$deleteStatement = $dbConnection->prepare('DELETE FROM numbers WHERE id=:id');
$deleteStatement->execute($paramArray);

$selectStatement = $dbConnection->prepare('SELECT * FROM numbers WHERE id <= (:id + 1)');
$selectStatement->execute($paramArray);
$numbers = $selectStatement->fetchAll();

echo 'In transaction:' . PHP_EOL;
include __DIR__ . '/../view/numbers.php';

$dbConnection->rollBack();

$selectStatement->execute($paramArray);
$numbers = $selectStatement->fetchAll();

echo 'After rollback:' . PHP_EOL;
include __DIR__ . '/../view/numbers.php';

Here we do this:

  • start a transaction;
  • delete a record;
  • run a query which shows the deletion having taken place;
  • rollback the transation;
  • perform the same query again, and the deletion has been rolled back.

Easy. I'd rather see some capability to explicitly hold onto the transaction until it was released. I don't think the act of rolling back should intrinsically mark the resolution of the transaction? One should be able to perform actions and rollback as often as one wants in one single transaction, shouldn't one? It's not something I've ever had to try, that said.

Oh, and the output:


>php.exe C:\src\transaction.php 4
In transaction:
ID: 1: English: one, Maori: tahi
ID: 2: English: two, Maori: rua
ID: 3: English: three, Maori: toru
ID: 5: English: five, Maori: rima
After rollback:
ID: 1: English: one, Maori: tahi
ID: 2: English: two, Maori: rua
ID: 3: English: three, Maori: toru
ID: 4: English: four, Maori: wha
ID: 5: English: five, Maori: rima

Process finished with exit code 0

Excellent.

Lastly I had to remind myself how to create a stored proc in MySQL, and I managed a pretty stupid one, but one which works for my example:

delimiter $$

CREATE DEFINER=`root`@`localhost` PROCEDURE `getThingsById`(id INT)
BEGIN
  SELECT * FROM numbers n WHERE n.id = id;
  SELECT * FROM colours c WHERE c.id = id;
  SELECT * FROM days d WHERE d.id = id;
END$$

This returns three record sets, each containing the number, colour or day of the week of the passed-in ID. It's nonsense, I know.

To call a proc, one still uses a prepared statement:

$id = $argv[1];

$dbConnection = ConnectionFactory::createConnection();

$preparedStatement = $dbConnection->prepare('call getThingsById(:id)');
$preparedStatement->execute(['id' => $id]);

$resultSet = $preparedStatement->fetchAll();
include __DIR__ . '/../view/general.php';

$preparedStatement->nextRowset();

$resultSet = $preparedStatement->fetchAll();
include __DIR__ . '/../view/general.php';

$preparedStatement->nextRowset();

$resultSet = $preparedStatement->fetchAll();
include __DIR__ . '/../view/general.php';

general.php:

foreach ($resultSet as $row) {
    printf('ID: %s: English: %s, Maori: %s%s', $row['id'], $row['en'], $row['mi'], PHP_EOL);
}


This is all pretty familiar, except the thing to note is the nextRowset call, which moves between the recordsets returned by the proc call.

Oh yeah... I keep forgetting... the output:


>php.exe C:\src\php\doctrine.local\scripts\callProc.php 3
ID: 3: English: three, Maori: toru
ID: 3: English: yellow, Maori: kowhai
ID: 3: English: Wednesday, Maori: Rāapa


That's all I've looked at so far. All in all it all seems uncharacteristically sensibtle for PHP, I must say.

And now I've been sitting in this hotel bar typing for the best (?) part of four hours. Time to focus on finishing this beer and going to bed I think.

Righto.

--
Adam

Wednesday 27 July 2016

PHP / Silex / Dependency injection: should I use a reference to a method?

G'day:
This is another example of where I want to put a question out there, but it needs more space that Twitter will allow. I guess I'll put it on Stack Overflow too.

We use Silex and Pimple's DI container. Generally in our service providers we expose references to objects, eg:

$app['service.something'] = $app->share(function ($app) {
    return new SomethingService();
});

Then usage of that is predictable:

$result = $app['service.something']::someMethod(1, 2, 3);

But here's the thing. As with the case above, the method is actually a static one. So it seems "odd" to be calling it on an instance of the SomethingService, rather than on the class. As coincidence would have it, this is the only public method in SomethingService too.

So we've done this sort of thing in our service provider:

$app['something'] = $app->protect(function ($arg1, $arg2, $etc) {
    return SomeClass::someMethod($arg1, $arg2, $etc);
});

Usage:

$result = $app['something'](1, 2, 3);

Now this works OK, but I have a coupla hesitations:
  • the name $app['something'] is a bit noun-y. As it's a method it ought to be verb-y IMO, eg: $app['doTheThing']. I guess that's small fish.
  • I'm just not sure if it's particularly "semantic" to be exposing just methods like that. All usage of Pimple I've seen has been to expose dependent objects.
I think we'll run with what we've got, but I kinda want to get more thoughts on if this is the right approach. And if so (or if not) why not. There might be some gotchas here I'm not thinking of, or something.

Cheers.

--
Adam

Monday 25 July 2016

PHP: re-call the same method for the same result, or use an intermediary variable?

G'day:
I could not fit this in a Twitter message, so I'll clutter up this joint instead. And, hey, it could use some content, I know.

I recently came across some PHP code (this happens to me a lot these day), which was along these lines:

$someObj->someMethod()['someKey']->doSomething();
$someObj->someMethod()['someOtherKey']->doSomethingElse();
$someObj->someMethod()['anotherKey']->performAnotherAction();
$someObj->someMethod()['lastKey']->doLastThing();

Obviously I've changed the names to protect the innocentterloper, but this is pretty much what's going on in the code. On four consecutive statements the same method is being called four times, returning the same result, which is then used for further processing.

If this was my code, I'd see no point in repeating myself all over again, instead just ask the question once, and reuse the answer:

$descriptiveName = $someObj->someMethod();
$descriptiveName['someKey']->doSomething();
$descriptiveName['someOtherKey']->doSomethingElse();
$descriptiveName['anotherKey']->performAnotherAction();
$descriptiveName['lastKey']->doLastThing();

To me calling a method is work, and I'd like to avoid work if possible. Just as my colleagues.

However this is not the only occurrence of the former approach I've seen, so perhaps I am missing something? Is there some benefit to desirable behaviour from calling the same method for the same answer repeatedly?

Note I'm not so fussed about performance in situations like this, although I would have expect calling a method multiple times would have an inconsequential performance hit compared to an intermediary variable. Is this not the case? But like I said... whether or not it might be is not really a consideration here. It just seems like doing work for the sake of it?

Thoughts?

For the sake of conversation and broadening everyone's knowledge... in your own language of choice if it's not PHP: what would your answer to this be?

--
Adam

Monday 18 July 2016

REST & nouns & verbs: analysing the right problem

G'day:
I've recently changed teams at work, and have shifted from "safe" code bases that I've either been working on daily for over a year, or new applications that I've had a hand in from the ground up. Now I'm doing maintenance and adding features to existing code-bases written by different teams for different requirements, developed to different coding guidelines. So that's all a bit of a rude awakening that I am still settling into. I'm definitely out of my comfort-zone, and having to ask different questions than I usually might. That said: it's a good move and I kinda like that sort of thing, so cool!

For better of for worse, we use a lot of REST web services even for our internal processes. I personally believe this is architecturally questionable, but as no-one thought to ask me the question, I didn't get to partake in that particular discussion. To be fair to the situation, it was a decision made before I was part of the department; and - equally - I'm not entirely convinced of my position that it's a daft idea. Just mostly convinced ;-) Anyway, as the over-used truism declares: we are where we are.

Anyway, I find myself doing some maintenance work on one of our web services. This particular service handles the back-end processing for our web applications which make image file uploads. As part of the file upload we need to do a few things:


  • upload the file and stick it in the appropriate place;
  • store some user-provided data about the file (name, description, etc);
  • create a few different resizings of the image for various purposes;
  • contrive some metadata on all the file variations based on [stuff].
  • distribute all the files across our CDN.


The first two of those steps are fast and can be done in real time. The file operations (resizing and distribution) take some time, and obviously the metadata extraction cannot be peformed until the files actually exist.

We have these split into two processes: one that always occurs immediately that uploads the master file and stores it and its user-enter data; then a second process which handles all the slow stuff.  This is just so we can provide fast feedback to the UI. Note that all this is still an atomic operation: both processes need to run.

On the first iteration of the implementation of this the two processes were fired off by the UI. Basically the upload form was submitted and the main request would handle the upload and the data-write, and then a second request was fired off by AJAX. This is not a robust solution as it isn't atomic. It's possible that the first call can be made but the second one doesn't for some reason. From an architectural point of view it's just not up to the view layer to make this call either; the versioning and metadata processing is an adjunct to the main file-handling process on the server: it's more the second half of the same process, not a separate process in its own right. Realistically the two are only separated at all because of performance considerations. From the perspective of the consumer of the web service: it should be one call to the API.

This was driven home to us by a second web app needing to do the same operations with the same web service, and the bods on that team knew about the first call but not the second.

So my task was to remediate this.

The solution we adopted was to have the front-end web app only call the one process, POSTing to http://api.example.com/file, passing along the file-upload form information as the POST body: this is what the UI-driven process was already doing. I simply augmented that to put an async job into our queuing system. The queue job receives the ID of the created file, and then it simply makes a second REST call, also POSTing to another end point: http://api.example.com/process. All good.

A first observation here is that "process" really isn't a great name for a REST end point. An end point should have meaning unto itself, and "process" is meaningless unless one knows it's tightly coupled to an initial end point relating to a /file: "Ah: so it's processing the file". One shouldn't need this sort of context when it comes to a web service end point. Sadly this end point is now out in the wild. Well: here "the wild" is just internal apps, but we can't simply change the URL to be something better without scheduling updates to those apps too. I'll be back-logging work to get that sorted out.

Part of my work here was to boyscout-rule the controllers of the "process" end point to move business logic out of the controller (yeah, don't ask), and into a service tier, which called a repository tier etc.

Some of the business logic I moved was to extract the decision as to which request parameter to use for the ID. Apparently we have some legacy code somewhere which passes the file ID as another argument name - I do not have the code in front of me right now so I can't remember what it is - so there was some logic to determine whether to use the legacy argument or the new one. Arguably this is perhaps still the job of the controller - it's dealing with stuff from the request object, which I can sell to myself as being controlller logic - but we're pretty stringent that we have as little logic in the controller as possible. And as I already needed a service method to do "Other Stuff", so I pushed this back into helper in the service too: something like extractDistributionCriteria(array $allCriteria). I don't want to be passing the entire request object into the service, so I extracted the POST params and passed those, using Silex's $request->request->all() method, and that returns the POST params as an array. Other than that, I was just shuffling logic around, so everything should be grand.