Friday 1 January 2016

PHP: initially puzzling but then obvious challenge with class files

G'day:
This is a "real world" situation we had in the office a few weeks ago. I started writing it up about a month ago, but then I got my hands on Fallout 4, and that has pretty much been what my spare time has been occupied with since then. But I decided to eschew it for an afternoon and get this article out of the way.

Initially I thought this was quite puzzling and unpredictable behaviour, but as I was rounding out the example code this morning I had the "[slap forehead] oh duh!" moment and I was reminded that despite having been a PHP dev for a year now, there's still a bunch of stuff that doesn't come naturally to me. I'll take some solace that it also didn't seem immediately obvious to some of my colleagues too, all of whom have been dedicated PHP devs for longer than I have.

We are currently needing to tread outside out safe world of the new website we're building (all new code, almost a pleasure to work with), and integrate with some of of the company's older code bases. This is a challenge to us for a few reasons:
  • we didn't write the code, one of the earlier teams did (there's no overlap between our team and those who wrote this code);
  • the code is old, so predates some stuff we would take for granted like PSR-4 autoloading and that sort of thing;
  • also cos it's quite old, concepts like "code quality" and various "preferred practices" weren't so mature as they are these days.
So... in short... it's all a bit of a ball-ache. Still: code is code, and we're paid to work with this stuff, so we gotta do it. And I hasten to add that the code under discussion here is all our new code. Well: it's a repro case based on problems we had with new code we're writing now. But the issue we fell foul of was exacerbated by the "uncertainty" around working with this unfamilar codebase we're integrating with.

The current assignment was to write some reporting scripts for the new system, which are not part of the app itself, so cannot leverage the new app's own code. They need to hook directly into the DB tier of the old system, and will be run in the context of a cron job rather than in the context of our app. What's worse all the cron stuff is managed by a different team, so the devs cannot even run the scripts they write in the context they'll be used, but instead need to basically write a "unit test" which calls the script, faking the calling environment. It's best you don't ask why this is the situation we find ourselves in, as there's really no good reason for it. There are reasons, but they're not good ones.

Unfortunately the old system has no concept of using Composer and PSR-4 (or even PSR-0!) autoloading, and their file-naming "standard" is incompatible with either of those standards so we could not bolt it into what we were doing. Basically we need to manually load any file we need to use, using the old require_once approach to things. And it's this that undid us. Well: as I said, we undid ourselves, but having to operate in unfamiliar old skool territory framed our failure.

Right, so we had a situation where we had a sub class which extended a base class, and the base class loaded some dependencies we needed from the old system. The base and sub class are our own code, the dependencies are classes from the old system. Whilst we can use Composer and PSR-4 to load our own code, we need to manually require the dependencies. Bleah. But so be it. Here's a repro of our problematic code:



// appConfig.php

require_once __DIR__ . '/../vendor/autoload.php';

defined('PROJECT_ROOT') || define('PROJECT_ROOT', realpath(dirname(__FILE__) . '/..'));
defined('DEPENDENCY_DIR') || define('DEPENDENCY_DIR', PROJECT_ROOT.'/other');

// BaseClass.php

namespace dac;

require_once __DIR__ . '/appConfig.php';

class BaseClass {

    public function __construct(){
    }

}


// OriginalSubclass.php
namespace dac;

require_once DEPENDENCY_DIR . '/Dependency.php';

class OriginalSubClass extends BaseClass {

    public function __construct(){
        require_once DEPENDENCY_DIR . '/Dependency.php';
        $dependency = new \other\Dependency();
        var_dump($dependency);
    }

}


// Dependency.php

namespace other;

class Dependency {

    public function __construct(){
    }

}


// tests/originalSubClass.php

require_once __DIR__ . '/../../vendor/autoload.php';

use \dac\OriginalSubClass;

$originalSubClass = new OriginalSubClass();

var_dump($originalSubClass);

The gist of this is:
  • We have an appConfig file which sets constant representing the path to where the dependency files are homed;
  • We require this file in the BaseClass file;
  • And the SubClass file uses the constant path to actually require one of the dependencies it needs;
  • And the SubClass subsequently uses an instance of Dependency;

When we ran this code via our unit test rig, it ran fine. However if we ran it stand-alone, we got this:

cli>php originalSubClass.php
PHP Notice: Use of undefined constant DEPENDENCY_DIR - assumed 'DEPENDENCY_DIR' in OriginalSubClass.php on line 8

So... the two puzzling things here twofold:
  • Why does this code break?
  • ... Except when called via PHPUnit?

Hopefully it's already obvious to any seasoned PHP devs out there. It was not obvious to me until I started finalising the code for this article, this morning. Note we fixed the issue back when it happened, but I still wasn't entirely sure why the code was breaking before hand.

Given the code works at all (in the PHPUnit case), obviously the code - in some context - is OK. But it seemed when being run via the cron the OO was breaking somehow: SubClass was somehow being executed without some of its inheritance back to BaseClass being realised. Weird.

Troubleshooting this, I put some telemetry in, and the output demonstrated that the SubClass code was running before the BaseClass code was:

cli>php originalSubClass.php
originalSubClass.php executing
autoload.php executed
OriginalSubClass USEd
before OriginalSubClass instance created
OriginalSubClass.php outside class definition
PHP Notice: Use of undefined constant DEPENDENCY_DIR - assumed 'DEPENDENCY_DIR' in OriginalSubClass.php on line 8
(If you look at the code on GitHub - as linked to in the file names in each code example, above - you'll see where the telemetry lines were inserted)

As part of testing this change, instead of using PHPUnit to run all all the tests, I just ran the test for just this one script. And, lo, I got the error. I ran the whole PHPUnit test run... no error. My reaction was "what the actual... erm... heck". The reason for the PHPUnit disparity occurred to me fairly quickly: PHPUnit loads all the files it's gonna need to use before it runs its tests, so irrespective of that particular code not loading BaseClass when it needed to, BaseClass.php was being loaded anyhow, as part of its own tests. Weirdly this still did not make the penny drop for me, but I concocted a fix anyhow. We slipped the require statement down into the contructor of SubClass:

// WorkingSubClass.php

namespace dac;

echo __FILE__ . ' outside class definition' . PHP_EOL;

class WorkingSubClass extends BaseClass {

    public function __construct(){
        echo __FILE__ . ' constructor executed' . PHP_EOL;

        require_once DEPENDENCY_DIR . '/Dependency.php';
        $dependency = new \other\Dependency();
        var_dump($dependency);
    }

}


And the test code:
// tests/workingSubClass.php

echo __FILE__ . ' executing' . PHP_EOL;

require_once __DIR__ . '/../../vendor/autoload.php';
echo 'autoload.php executed' . PHP_EOL;

use \dac\WorkingSubClass;
echo 'WorkingSubClass USEd' . PHP_EOL;

echo 'before WorkingSubClass instance created' . PHP_EOL;
$workingSubClass = new WorkingSubClass();
echo 'after WorkingSubClass instance created' . PHP_EOL;

var_dump($workingSubClass);
This works nicely:
cli>workingSubClass.php workingSubClass.php executing autoload.php executed WorkingSubClass USEd before WorkingSubClass instance created WorkingSubClass.php outside class definition BaseClass.php outside class definition appConfig.php loaded WorkingSubClass.php constructor executed Dependency.php outside class definition Dependency.php constructor executed object(other\Dependency)#3 (0) { } after WorkingSubClass instance created object(dac\WorkingSubClass)#2 (0) { }

Even after this, I still didn't get really what was going on. It seemed clear that the PHP parser was parsing the SubClass.php file before it needed to be executed to create the object, and it wasn't until an actual instance needed to be created that the inheritance part of things "worked" so that the constant set in BaseClass was actually created, and available to SubClass. I just shrugged and put it down to one of those "weird PHP things" (which, realistically, is entirely reasonable, given what PHP is like).

Then - this morning - I remembered something. There's no such thing as "a class file" in PHP (as opposed to a script file). BaseClass.php and SubClass.php might be named as if they are class files, but in PHP everything is just a script file. Script files just happen to be able to contain class definitions along with any other PHP code one might want to have in there. But not all the code in a file which contains a class definition is related to the class. It's just "some code in the file". Now when one requires a file, then the code in it parses and executes. So the constant declaration will run, whereas before it wasn't running because we were confusing OO inheritance with PHP code execution.

However in our code, we are using Composer to require our "class" files transparently for us. But we lost sight of the fact that Composer is not loading the classes for us, it's just loading the files that contain the classes in them. So the simple act of requiring SubClass.php does not make PHP go "well there's a class in this file somewhere, and it extends another class, so best we look at what that class file needs before we get under way, and everything'll be lovely". That's not a reasonable supposition on our part.

Let's look at the code again, in the sequence it loads and executes.

// tests/originalSubClass.php
require_once __DIR__ . '/../../vendor/autoload.php'; // just ignore this, it's an autogenerated file anyhow

use \dac\OriginalSubClass;
$originalSubClass = new OriginalSubClass();


    // OriginalSubClass.php
    namespace dac;

    require_once DEPENDENCY_DIR . '/Dependency.php';
Here I've tried to show the code as it's executed, rather than how it appears in discrete filed. Seen in this light... it's bloody obvious why the code breaks: we are indeed using DEPENDENCY_DIR before it's been declared. This also makes the fix we used understandable. Let's have a look at what code that executes. In the example below I have "inlined" various files so what statements get executed in what order is more clear. I've indented each new file, and included a comment reflecting its name:

// tests/workingSubClass.php
require_once __DIR__ . '/../../vendor/autoload.php';

use \dac\WorkingSubClass;
$workingSubClass = new WorkingSubClass();


    // WorkingSubClass.php
    namespace dac;
    class WorkingSubClass extends BaseClass {

    
        // BaseClass.php
        namespace dac;
        require_once __DIR__ . '/appConfig.php';


            // appConfig.php
            echo __FILE__ . ' loaded' . PHP_EOL;
            require_once __DIR__ . '/../vendor/autoload.php';
            defined('PROJECT_ROOT') || define('PROJECT_ROOT', realpath(dirname(__FILE__) . '/..'));
            defined('DEPENDENCY_DIR') || define('DEPENDENCY_DIR', PROJECT_ROOT.'/other');

            
        class BaseClass {

            public function __construct(){
            }

        }


        public function __construct(){
            require_once DEPENDENCY_DIR . '/Dependency.php';

            
                // Dependency.php
                namespace other;

                class Dependency {
                    public function __construct(){
                    }

                }


            $dependency = new \other\Dependency();
            var_dump($dependency);
        }

    }


var_dump($workingSubClass);

(Hopefuly that's readable and demonstrates what I mean)

I guess the bottom line is here we've been spoiled by the work Composer does for us, and we've started thinking in terms of "classes" rather than "files". From a programming perspective, this is a good thing, but we had also not forget about the nuts and bolts of how PHP actually works.

One last gripe about PHP here though. This:

PHP Notice: Use of undefined constant DEPENDENCY_DIR - assumed 'DEPENDENCY_DIR' in OriginalSubClass.php on line 8
What frickin' moron decided that this is a WARNING situation, and that it's ever legitmate to assume (and remember what everyone knows about what happens when we "ass-u-me" anything) that when the code clearly says SOMETHING_THAT_IS_DEFINITELY_NOT_A_STRING, that the code "probably meant" "SOMETHING_THAT_ACTUALLY_IS_A_STRING"? When is that ever going to be something that's the correct supposition to make? And what's a frickin' "notice"? Is the code wrong or not? If it's not wrong: STFU and get on with it. If it is wrong: break! How is this - in 2016 - still a mysterious concept to PHP? Note that the code continued to try to run after this "notice", and only fell flat when it became abundantly clear that PHP's ass-u-me-ption (sorry) - which it knew was sketchy as it decided to tell us - proved to be invalid and the code collapsed in a heap. If the code is frickin' invalid, then say so, throw an exception, and stop. PHP 7 was supposed to fix this shite, but it still just doesn't know how code is supposed to work. FFS.

Well that was cathartic.

So there you go. Proof I still do computer programming type stuff, even if I've been quiet on that front recently.

Righto.

--
Adam