Showing posts with label PHP. Show all posts
Showing posts with label PHP. Show all posts

Wednesday 9 December 2020

PHPUnit: get code coverage reporting working on PHP 8 / PHPUnit 9.5

 G'day:

I'm still beavering away getting a PHP dev environment working via Docker. As I mentioned in the previous article, I'm writing an article about how I'm doing that as I go, but I've had a few issues today that have made me pause that.

My current issue has been getting PHPUnit working properly. This has nothing to do with the Docker environment (as far as I can tell), it's just... me and PHPUnit not playing nicely together. The solution to my issue was not immediately apparent to me... it's taken me about three hours to sort this out. So I figured it might be worthwhile writing down.

I have a barebones "G'day World" website running in a coupla Docker containers (one for Nginx, one for PHP). That's all working fine, I can get "G'day world" being delivered to the browser via both HTML and PHP requests. Before I start writing any really-really code, I decided I better get PHPUnit working. TDD 'n' all that. I will admit I felt dirty yesterday when I wrote the gdayWorld.html and gdayWorld.php and ran them without tests; I just browsed to them and went "seems legit". Not the way I like to operate.

I sat down to write some tests. My gdayWorld.php is... basic:

<?php

$message = "G'day World";
echo $message;

I don't have any framework or anything installed as yet, so that's as good as it gets. That is in the web root. To test that I'd be needing to test that by curling it and checking the response, then I remembered Symfony has a WebTestCase for this sort of thing, then I decided I could not be arsed looking into that right now. All I need is a test to run to check PHPUnit is running. So... erm... here we go:

    /** @coversNothing */
    public function testNothing()
    {
        $this->assertTrue(false);
    }

This is enough to test PHPUnit. I'll write a better test once I know everything is working.

root@5fdd4abf5d20:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.

F                                 1 / 1 (100%)

Time: 00:00.166, Memory: 6.00 MB

There was 1 failure:

1) adamCameron\fullStackExercise\test\functional\public\WebServerTest::testNothing
Failed asserting that false is true.

/usr/share/fullstackExercise/test/functional/public/WebServerTest.php:12

FAILURES!
Tests: 1, Assertions: 1, Failures: 1.
root@5fdd4abf5d20:/usr/share/fullstackExercise#

Perfect. Now I can make it pass ;-)

$this->assertTrue(true);
root@5fdd4abf5d20:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.

.                                 1 / 1 (100%)

Time: 00:00.127, Memory: 6.00 MB

OK (1 test, 1 assertion)
root@5fdd4abf5d20:/usr/share/fullstackExercise#

So far, so good. My next step whilst I'm configuring stuff on PHPUnit, I decided to get the code coverage working too. I just added the code-coverage reporting bit to phpunit.xml and re-ran the tests:

root@eb28587ca66f:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.

Warning: No code coverage driver available

.                                 1 / 1 (100%)

Time: 00:00.210, Memory: 6.00 MB

OK (1 test, 1 assertion)
root@eb28587ca66f:/usr/share/fullstackExercise#

Doh! Forgot about that. I need to install Xdebug.

This was way more horsing around than it could have been, but this is down to the vagaries of the Docker set-up I was using. Once I decided to change direction it was a one-liner in me Dockerfile, and all good.

root@37d06fcf9209:/usr/share/fullstackExercise# php -v
PHP 8.0.0 (cli) (built: Dec 1 2020 03:33:03) ( NTS )
Copyright (c) The PHP Group
Zend Engine v4.0.0-dev, Copyright (c) Zend Technologies
with Xdebug v3.0.1, Copyright (c) 2002-2020, by Derick Rethans
root@37d06fcf9209:/usr/share/fullstackExercise#

Run the tests again...

root@37d06fcf9209:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.

Warning: XDEBUG_MODE=coverage or xdebug.mode=coverage has to be set

.                                 1 / 1 (100%)

Time: 00:00.220, Memory: 6.00 MB

OK (1 test, 1 assertion)
root@37d06fcf9209:/usr/share/fullstackExercise#

OK. Fine. Makes sense I s'pose. I added in the ini setting into my phpunit.xml:

<php>
  <ini name="xdebug.mode" value="coverage" />
</php>

No difference. I figured the ini setting is probably changed too late in the piece, as the xdebug.mode will need to be set before PHP starts running any code. And perhaps the environment variable approach would work such that the actual environment variable was set before hand. As it happens it ain't - all that happens is the value is added to the $_ENV array, which is hardly the same thing. From the docs:



However this step is still a relevant one cos we see a behavioural change here:

root@37d06fcf9209:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.
Segmentation fault
root@37d06fcf9209:/usr/share/fullstackExercise#

Splat!

I'm gonna spare you a coupla hours of looking in to how to troubleshoot segmentation faults here. Nothing I read helped, and there's nothing useful relating to getting them with PHPUnit, other than a bunch of people getting these segmentation faults.

I started to wonder if it was an issue with Xdebug and PHP8, given PHP8 is so new and sometimes in the last I've noted that Xdebug has needed updating when there's a point release of PHP. But looking into this (at Xdebug: Supported Versions and Compatibility), the current version of Xdebug has been around for a while, and they say it's good to go with PHP8:

I was now thinking about versions of stuff, so I decided to rollback PHP to 7.4, and PHPUnit to 8.5. I realise I should have only done one at a time, but it did not occur to me when doing this that when troubleshooting things, only change one thing per test iteration.

root@41f51888adff:/usr/share/fullstackExercise# php -v
PHP 7.4.13 (cli) (built: Dec 1 2020 04:25:48) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Xdebug v3.0.1, Copyright (c) 2002-2020, by Derick Rethans
root@41f51888adff:/usr/share/fullstackExercise# vendor/bin/phpunit --version
PHPUnit 8.5.13 by Sebastian Bergmann and contributors.

root@41f51888adff:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 8.5.13 by Sebastian Bergmann and contributors.

.Code coverage needs to be enabled in php.ini by setting 'xdebug.mode' to 'coverage'
root@41f51888adff:/usr/share/fullstackExercise#

This verifies we are back to PHP 7.4 and PHPUnit 8.5. And we don't have the segmentation fault any more, but... now PHP is ignoring both of the settings in phpunit.xml. I have this in place:

<php>
  <env name="XDEBUG_MODE" value="coverage"/>
  <ini name="xdebug.mode" value="coverage"/>
</php>

At this point I read up on the <ini> and <env> settings in phpunit.xml, and confirmed both of them are set too late for Xdebug to read them anyhow. Quite odd the way PHPUNit 9.5 reacts to the XDEBUG_MODE being set in that case. Anyway, I need to set them before PHP is run, so I set a really-really environment variable in my Dockerfile:

ENV XDEBUG_MODE=coverage

After rebuilding my containers:

root@6053a26dc3eb:/usr/share/fullstackExercise# echo $XDEBUG_MODE
coverage
root@6053a26dc3eb:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 8.5.13 by Sebastian Bergmann and contributors.

..                                 2 / 2 (100%)

Time: 1.01 seconds, Memory: 6.00 MB

OK (2 tests, 2 assertions)

Generating code coverage report in HTML format ... done [596 ms]
root@6053a26dc3eb:/usr/share/fullstackExercise#

Hurrah! Let's have a look at the report:

This is correct: I'm not testing that code in /public yet.

I also quickly tried to use the xdebug.mode ini file setting. I dropped a /usr/local/etc/php/conf.d/phpunit-code-coverage-xdebug.ini with just xdebug.mode=coverage in it into my container's file system:

root@f3cd4a0715a5:/usr/share/fullstackExercise# cat /usr/local/etc/php/conf.d/phpunit-code-coverage-xdebug.ini
xdebug.mode=coverage
root@f3cd4a0715a5:/usr/share/fullstackExercise# php --ini
Configuration File (php.ini) Path: /usr/local/etc/php
Loaded Configuration File: (none)
Scan for additional .ini files in: /usr/local/etc/php/conf.d
Additional .ini files parsed: /usr/local/etc/php/conf.d/docker-php-ext-pdo_mysql.ini,
/usr/local/etc/php/conf.d/docker-php-ext-sodium.ini,
/usr/local/etc/php/conf.d/docker-php-ext-xdebug.ini,
/usr/local/etc/php/conf.d/phpunit-code-coverage-xdebug.ini

root@f3cd4a0715a5:/usr/share/fullstackExercise# php -i | grep xdebug.mode
xdebug.mode => coverage => coverage
root@f3cd4a0715a5:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 8.5.13 by Sebastian Bergmann and contributors.

..                                 2 / 2 (100%)

Time: 1.07 seconds, Memory: 6.00 MB

OK (2 tests, 2 assertions)

Generating code coverage report in HTML format ... done [869 ms]
root@f3cd4a0715a5:/usr/share/fullstackExercise#

Perfect.

My supposition here is that the issue all along was that I needed to set one of either the ini file option, or the environment variable option before PHPUnit runs. PHPUnit 8 correctly sez "OI!" if I don't; whereas PHPUnit 9 doesn't check correctly, and instead messes things up. Let's roll forward to PHP8 and PHPUnit 9.5 again and see:

root@f7d29f40ad62:/usr/share/fullstackExercise# php -v
PHP 8.0.0 (cli) (built: Dec 1 2020 03:33:03) ( NTS )
Copyright (c) The PHP Group
Zend Engine v4.0.0-dev, Copyright (c) Zend Technologies
with Xdebug v3.0.1, Copyright (c) 2002-2020, by Derick Rethans
root@f7d29f40ad62:/usr/share/fullstackExercise# vendor/bin/phpunit --version
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.

root@f7d29f40ad62:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.
..                                 2 / 2 (100%)

Time: 00:01.535, Memory: 10.00 MB

OK (2 tests, 2 assertions)

Generating code coverage report in HTML format ... done [00:00.605]
root@f7d29f40ad62:/usr/share/fullstackExercise#

Hoo-frickin-ray!!

So that supposition was correct then.

I guess I could have saved myself a bunch of time here by just setting the .ini setting or the environment variable in the correct environment right from the outset. Then again had PHPUnit 9 not changed its warnings, I would have known that straight away: PHPUnit 8 was pretty clear on this. The problem and the solution here were pretty mundane, but I learned a lot about how to do things with Docker (which will all go into the other article I'm writing at the same time as this one), some stuff about Xdebug, some stuff about PHP and some stuff about PHPUnit. So it was a worthwhile afternoon of scratching my head. I'm also pleased I've sat here and done 8hrs of "work" for the second day running. Hopefully this will stick and I'll work my way out of my unemployed malaise before long. We'll see.

PHPUnit 9.5 on PHP 7.4

As I said earlier, I should not have changed two things at once when testing this, as it makes it harder to draw accurate conclusions. I omitted to say I did then go back and test PHPUnit 9.5 on PHP 7.4, so as to check whether the issue was due to PHPUnit 9.5, or whether it was some combination of that and PHP version. Here are the test results when running PHPUnit 9.5 on PHP 7.4 with XDEBUG_MODE set in phpunit.xml and not in the environment. xdebug.mode is not set anywhere:

root@81c63ae50626:/usr/share/fullstackExercise# php -v
PHP 7.4.13 (cli) (built: Dec 1 2020 04:25:48) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Xdebug v3.0.1, Copyright (c) 2002-2020, by Derick Rethans
root@81c63ae50626:/usr/share/fullstackExercise# vendor/bin/phpunit --version
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.
root@81c63ae50626:/usr/share/fullstackExercise# vendor/bin/phpunit
PHPUnit 9.5.0 by Sebastian Bergmann and contributors.
Segmentation fault
root@81c63ae50626:/usr/share/fullstackExercise#

To me this demonstrates the issue is with PHPUnit 9.5 and nothing else.

Anyway... now for a beer.

Righto.


--
Adam

Friday 20 December 2019

PHPUnit: trying to be clever with data providers and failing

G'day:
Yesterday we had a seemingly strange situation with a test one of my colleagues had written. It was behaving as if PHPUnit's exactly matcher was not working. The developer concerned was in the process of concluding that that is exactly what the problem is: PHPUnit. I suggested we pause and back-up a bit: given thousands of people uses PHPUnit and the exactly matcher every day (including ourselves) and have not reported this issue, whereas the code we are running it with was freshly written and not used by anyone yet... almost certainly the problem lay with our code.

Before I get on with what the problem was, this bears repeating:

If you are doing something new and it's going wrong, the problem will almost certainly not be related to any established third-party tools you are using. It'll be down to how you're using them.

I find it's a really helpful mindset when troubleshooting problems to just own the issue from the outset, rather than looking around for something else to blame. And if you've looked and looked and can't see the problem with your own code, this probably just means you haven't spotted the problem yet. Absence of evidence is not evidence of absence.

Anyway, the issue at hand. Here's a pared-down repro of the situation. The code is meaningless as I've removed all business context from it, but we have this:

class MyService
{
    private $dependency;

    function __construct(MyDependency $dependency){
        $this->dependency = $dependency;
    }

    function myMethod() {
        return $this->dependency->someMethod();
    }
}


And MyDependency is this:

class MyDependency
{
    function someMethod(){
        return "ORIGINAL VALUE";
    }
}

myMethod is new, so the dev had written a test for this:

class MyServiceTest extends TestCase
{

    /** @dataProvider provideTestCasesForMyMethodTests */
    public function testMyMethodViaDataProvider($dependency)
    {
        $dependency
            ->expects($this->once())
            ->method('someMethod')
            ->willReturn("MOCKED VALUE");

        $service = new MyService($dependency);

        $result = $service->myMethod();

        $this->assertEquals("MOCKED VALUE", $result);
    }

    public function provideTestCasesForMyMethodTests(){
        return [
            'it should call someMethod once' => [
                'dependency' => $this->createMock(MyDependency::class)
            ]
        ];
    }
}

I hasten to add that it's completely daft to have the data provider in this example. However in the real-world example there were valid test variations, and given the method being tested was a factory method, part of the test was to check that the relevant method had been called on the relevant object being returned by the factory method. The data provider was legit.

This test passes:

PS C:\src\php\general> vendor\bin\phpunit .\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php
PHPUnit 8.5.0 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 77 ms, Memory: 4.00 MB

OK (1 test, 1 assertion)
PS C:\src\php\general>

However when we were trouble shooting something, we ended up with this assertion:

$dependency
    ->expects($this->exactly(10))
    ->method('someMethod')
    ->willReturn("MOCKED VALUE");


And the test still passed:

PS C:\src\php\general> vendor\bin\phpunit .\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php
PHPUnit 8.5.0 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 77 ms, Memory: 4.00 MB

OK (1 test, 1 assertion)
PS C:\src\php\general>

Whut the?

When we were looking at this I asked my colleague to simplify their tests a bit. The IRL version was testing this case via a data provider and test that was testing a bunch of things in a method that had some complexity issues, and I wanted to pare the situation back to just one simple test, testing this one thing. We ended up with this test:

public function testMyMethodViaInlineMock()
{
    $dependency = $this->createMock(MyDependency::class);
    $dependency
        ->expects($this->once())
        ->method('someMethod')
        ->willReturn("MOCKED VALUE");

    $service = new MyService($dependency);

    $result = $service->myMethod();

    $this->assertEquals("MOCKED VALUE", $result);
}


The difference here being we are not using a data provider any more, the mocking of the dependency is in the test. And this passes:

PS C:\src\php\general> vendor\bin\phpunit .\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php                 PHPUnit 8.5.0 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 61 ms, Memory: 4.00 MB

OK (1 test, 2 assertions)
PS C:\src\php\general>                                                                                                                              

Good, fine, but now we tested out the ->expects($this->exactly(10)) issue again:

PS C:\src\php\general> vendor\bin\phpunit .\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php                 PHPUnit 8.5.0 by Sebastian Bergmann and contributors.

F                                                                   1 / 1 (100%)

Time: 102 ms, Memory: 4.00 MB

There was 1 failure:

1) me\adamcameron\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest::testMyMethodViaInlineMock
Expectation failed for method name is "someMethod" when invoked 10 time(s).
Method was expected to be called 10 times, actually called 1 times.

FAILURES!
Tests: 1, Assertions: 2, Failures: 1.
PS C:\src\php\general>                                                                                                                                                    

This is more like it. But... erm... why?

I'd had "unexpected" behaviour previously when referring to $this inside a data provider method, because the data provider method is not called on the same instance of the test class that the tests are run with (why? Buggered if I know, but it is), and had my suspicions that this could be contributing to it. This is when we just shrugged and left the test as it was - working now - and I undertook to investigate some more later. The code I've been sharing here is the start of my investigation. I was at this point none the wiser though.

On a whim I tried another test variation, to see if there was any change in behaviour across the threshold of the count (so checking "off by one" situations). Here I have a new method that calls the same dependency method five times:
function myOtherMethod(){
     $this->dependency->someOtherMethod();
     $this->dependency->someOtherMethod();
     $this->dependency->someOtherMethod();
     $this->dependency->someOtherMethod();
    return  $this->dependency->someOtherMethod();
}
And a test for it that just checks what happens when we expect it to be called 3-7 times:

/** @dataProvider provideTestCasesForMyOtherMethodTests */
public function testMyOtherMethodCallsDependencyMethodEnoughTimes($dependency, $expectedCount) {
    $dependency
        ->expects($this->exactly($expectedCount))
        ->method('someOtherMethod')
        ->willReturn("MOCKED VALUE");

    $service = new MyService($dependency);

    $result = $service->myOtherMethod();

    $this->assertEquals("MOCKED VALUE", $result);
}

public function provideTestCasesForMyOtherMethodTests(){
    return [
        'it expects myOtherMethod to be called 3 times' => [
            'dependency' => $this->createMock(MyDependency::class),
            'expectedCount' => 3
        ],
        'it expects myOtherMethod to be called 4 times' => [
            'dependency' => $this->createMock(MyDependency::class),
            'expectedCount' => 4
        ],
        'it expects myOtherMethod to be called 5 times' => [
            'dependency' => $this->createMock(MyDependency::class),
            'expectedCount' => 5
        ],
        'it expects myOtherMethod to be called 6 times' => [
            'dependency' => $this->createMock(MyDependency::class),
            'expectedCount' => 6
        ],
        'it expects myOtherMethod to be called 7 times' => [
            'dependency' => $this->createMock(MyDependency::class),
            'expectedCount' => 7
        ]
    ];
}


All things being equal, I was expecting no failures here, given what we'd seen before. But... erm...

PS C:\src\php\general> vendor\bin\phpunit .\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php --filter=testMyOtherMethodCallsDependencyMethodEnoughTimes                                   PHPUnit 8.5.0 by Sebastian Bergmann and contributors.

FF...                                                               5 / 5 (100%)

Time: 64 ms, Memory: 4.00 MB

There were 2 failures:

1) me\adamcameron\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest::testMyOtherMethodCallsDependencyMethodEnoughTimes with data set "it expects myOtherMethod to be called 3 times" (Mock_MyDependency_6a23ba62 Object (...), 3)
me\adamcameron\general\phpunit\expectsWithDataProviderIssue\MyDependency::someOtherMethod() was not expected to be called more than 3 times.

C:\src\php\general\src\phpunit\expectsWithDataProviderIssue\MyService.php:21
C:\src\php\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php:59

2) me\adamcameron\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest::testMyOtherMethodCallsDependencyMethodEnoughTimes with data set "it expects myOtherMethod to be called 4 times" (Mock_MyDependency_6a23ba62 Object (...), 4)
me\adamcameron\general\phpunit\expectsWithDataProviderIssue\MyDependency::someOtherMethod() was not expected to be called more than 4 times.

C:\src\php\general\src\phpunit\expectsWithDataProviderIssue\MyService.php:22
C:\src\php\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php:59

FAILURES!
Tests: 5, Assertions: 3, Failures: 2.
PS C:\src\php\general>                   


OK so this is a bit perplexing. When the expected call count is less than the actual call count we get it failing correcty. But when it's more... no failure. Now I'm flummoxed. For good measure I also have a test not creating the dependency in the data provider, but instead inline in the test:

/** @dataProvider provideTestCasesForMyOtherMethodTests */
public function testMyOtherMethodCallsDependencyMethodEnoughTimesWithDependencyCreatedInTest($_, $expectedCount) {
    $dependency = $this->createMock(MyDependency::class);
    $dependency
        ->expects($this->exactly($expectedCount))
        ->method('someOtherMethod')
        ->willReturn("MOCKED VALUE");

    $service = new MyService($dependency);

    $result = $service->myOtherMethod();

    $this->assertEquals("MOCKED VALUE", $result);
}


And this one behaves correctly:

PS C:\src\php\general> vendor\bin\phpunit .\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php --filter=testMyOtherMethodCallsDependencyMethodEnoughTimesWithDependencyCreatedInTest                                                   PHPUnit 8.5.0 by Sebastian Bergmann and contributors.

FF.FF                                                               5 / 5 (100%)

Time: 67 ms, Memory: 4.00 MB

There were 4 failures:

1) me\adamcameron\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest::testMyOtherMethodCallsDependencyMethodEnoughTimesWithDependencyCreatedInTest with data set "it expects myOtherMethod to be called 3 times" (Mock_MyDependency_2707ab91 Object (...), 3)
me\adamcameron\general\phpunit\expectsWithDataProviderIssue\MyDependency::someOtherMethod() was not expected to be called more than 3 times.

C:\src\php\general\src\phpunit\expectsWithDataProviderIssue\MyService.php:21
C:\src\php\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php:99

2) me\adamcameron\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest::testMyOtherMethodCallsDependencyMethodEnoughTimesWithDependencyCreatedInTest with data set "it expects myOtherMethod to be called 4 times" (Mock_MyDependency_2707ab91 Object (...), 4)
me\adamcameron\general\phpunit\expectsWithDataProviderIssue\MyDependency::someOtherMethod() was not expected to be called more than 4 times.

C:\src\php\general\src\phpunit\expectsWithDataProviderIssue\MyService.php:22
C:\src\php\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php:99

3) me\adamcameron\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest::testMyOtherMethodCallsDependencyMethodEnoughTimesWithDependencyCreatedInTest with data set "it expects myOtherMethod to be called 6 times" (Mock_MyDependency_2707ab91 Object (...), 6)
Expectation failed for method name is "someOtherMethod" when invoked 6 time(s).
Method was expected to be called 6 times, actually called 5 times.

4) me\adamcameron\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest::testMyOtherMethodCallsDependencyMethodEnoughTimesWithDependencyCreatedInTest with data set "it expects myOtherMethod to be called 7 times" (Mock_MyDependency_2707ab91 Object (...), 7)
Expectation failed for method name is "someOtherMethod" when invoked 7 time(s).
Method was expected to be called 7 times, actually called 5 times.

FAILURES!
Tests: 5, Assertions: 6, Failures: 4.
PS C:\src\php\general>                                                                                                                                                                                                                                               

I started diving back through the PHPUnit code, and found the point where my two test variations began to demonstrate different behaviour. I edited InvocationHandler::verify to spit out some telemetry:

public function verify(): void
{
    printf(PHP_EOL . "InvocationHandler::verify called with %d matchers" . PHP_EOL, count($this->matchers));

    foreach ($this->matchers as $matcher) {
        $matcher->verify();
    }



And here's what I get running the two tests:

PS C:\src\php\general> vendor\bin\phpunit .\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php --filter=testMyMethodViaDataProvider                                                                                                    PHPUnit 8.5.0 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)
InvocationHandler::verify called with 0 matchers


Time: 60 ms, Memory: 4.00 MB

OK (1 test, 1 assertion)
PS C:\src\php\general> vendor\bin\phpunit .\test\phpunit\expectsWithDataProviderIssue\MyServiceTest.php --filter=testMyMethodViaInlineMock                                                                                                      PHPUnit 8.5.0 by Sebastian Bergmann and contributors.

F                                                                   1 / 1 (100%)
InvocationHandler::verify called with 1 matchers


Time: 62 ms, Memory: 4.00 MB

There was 1 failure:

1) me\adamcameron\general\test\phpunit\expectsWithDataProviderIssue\MyServiceTest::testMyMethodViaInlineMock
Expectation failed for method name is "someMethod" when invoked 10 time(s).
Method was expected to be called 10 times, actually called 1 times.

FAILURES!
Tests: 1, Assertions: 2, Failures: 1.
PS C:\src\php\general>                             

Well. So when we run our tests using the mock created in the test code... we get our matcher "loaded" (for lack of a better term); when we run our test using the mock created in the data provider: it's nowhere to be seen. Interesting.

Looking further up the call stack, we find that this verify method is called from TestCase::verifyMockObects:
private function verifyMockObjects(): void
{
    foreach ($this->mockObjects as $mockObject) {
        if ($mockObject->__phpunit_hasMatchers()) {
            $this->numAssertions++;
        }

        $mockObject->__phpunit_verify(
            $this->shouldInvocationMockerBeReset($mockObject)
        );
    }


(I dunno what's going on with the method name there, but I did verify the call stack... PHPUnit is doing some special magic there).

verifyMockObjects in turn is called from runBare in the same class, and that is run from TestResult::run.

The key thing here is verifyMockObjects being run in the context of the TestCase object, and it looks at $this->mockObjects. Where does $this->mockObjects come from? When we create the mock:

public function getMock(): MockObject
{
    $object = $this->generator->getMock(
        $this->type,
        !$this->emptyMethodsArray ? $this->methods : null,
        $this->constructorArgs,
        $this->mockClassName,
        $this->originalConstructor,
        $this->originalClone,
        $this->autoload,
        $this->cloneArguments,
        $this->callOriginalMethods,
        $this->proxyTarget,
        $this->allowMockingUnknownTypes,
        $this->returnValueGeneration
    );

    $this->testCase->registerMockObject($object);

    return $object;
}

// ...

public function registerMockObject(MockObject $mockObject): void
{
    $this->mockObjects[] = $mockObject;
}

(remember that our createMock call is just a wrapper for a getMockBuilder()...->getMock() call).

So what's the bottom line here? Well my initial wariness about creating the mock in the data provider method has legs... when a mock is created it's registered in the TestCase object it's created within... and the one that the data provider method is running in is not the same TestCase as the one the test is run in. So... none of the mock's constraints are checked when the test is run. That's mostly it.

But what's the story with how that last test still failed on the variants with only 3/5 and 4/5 calls made? That check is done when the mocked method is actually called. So given this:

function myMethod() {
    return $this->dependency->someMethod();
}

When the mocked someMethod is called, a check is made then as to whether it's been called too many times for the expectation. So that's one within the mock itself at the time it's executed, hence it still "works". It's just the check that the exact number of call times was hit that is done at test level, not mock level if that makes sense.

This all just goes to show that one should just use data provider methods for providing data values for test variations, not to treat them like some sort of looping mechanism that the test is just a callback for. That's not the intent, and when one tries to be tricky... one is quickly found out to not be as clever as one hopes... ;-)

That's an awful lot of carry on for what ended up to be an easy / simple explanation. I'm mostly writing this up in case anyone else gets the same issue, and also to show how I went about troubleshooting and investigating it. I'll be putting this in front of my team shortly...

Righto.

--
Adam

Monday 2 July 2018

PHP: Expectation management with Symfony Validator validation

G'day:
I was checking out of of our web services the other day, replicating a bug I was fixing, and found the need to guess at what the body content needed to be for a POST I was doing. Our validation is solid, so I could just start with an empty object, and the validation would tell me what I needed to pass. Or, yes, I could RTFM but I couldn't be arsed.

Anyhow, this tactic worked for the first few properties: the validation came back with "you need this property"... "nah, it needs to be an integer"... "that's too low", etc. Cool. But then I passed an empty string for one of the properties, and instead of getting a 400 back with a hint as to what the constraints needed to be; I instead got a 500 server error. F***'s sake.

{
    "errors": [
        "Internal Server Error"
    ]
}

This is good and bad. Bad it was a 500 for some reason; good that there's no detail bleeding out. This was part of what I was fixing.

Looking at the logs I was getting this:

{"errors": ["Expected argument of type \"array or Traversable and ArrayAccess\", \"string\" given"]}

Indeed I was passing in a string:

{
  "firstName" : "Zachary",
  "lastName" : "Cameron Lynch",
  "address" : ""
}

(not the actual data in question, but it's work code so I cannot share it). Right so address is supposed to be an object, eg:

{
    "firstName" : "Zachary",
    "lastName" : "Cameron Lynch",
    "address" : {
        "street" : "123 Some Street",
        "city" : "Our City"
    }
}

That sort of thing.

But anyhow, why's the valiation failing? Well not failing - I'd expect / want that - but why is it erroring? I assumed we were doing something wrong: not validating it was an object, and just charging off and using it. I looked into it, and our code seemed legit, so start trying to make a solid repro case so I can ask people out in the world about it.

I know nothing about Symfony's Validator component. I have code-reviewed other people using it, so am vaguely aware of how it works, but I've not really used it. First things first then... the docs.

I concocted this test case from the introductory docs:

use PHPUnit\Framework\TestCase;
use Symfony\Component\Validator\Constraints\Length;
use Symfony\Component\Validator\Constraints\NotBlank;
use Symfony\Component\Validator\Validation;

class StringValidatorTest extends TestCase
{
    public function testExampleFromDocumentation()
    {
        $validator = Validation::createValidator();
        $violations = $validator->validate(
            'Bernhard',
            [
                new Length(['min' => 10]),
                new NotBlank(),
            ]
        );

        $this->assertCount(1, $violations);
        $this->assertEquals(
            'This value is too short. It should have 10 characters or more.',
            $violations[0]->getMessage()
        );
    }
}

And this worked as expected:

PHPUnit 7.2.6 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 218 ms, Memory: 4.00MB

OK (1 test, 2 assertions)

Also this is pretty straight fwd. Create a validator, pass a value and some constraints to a validate, and... job done.

From there I needed to test the behaviour of passing the wrong data type when expected. I decided expecting an integer and then not giving it one would be the easiest test here:

First a class to wrap-up the validator with the constraints in question:

namespace me\adamcameron\general\validation\collectionIssue;

use Symfony\Component\Validator\Constraints\Type;
use Symfony\Component\Validator\Validation;

class IntegerValidator
{

    private $validator;
    private $constraints;

    public function __construct()
    {
        $this->validator = Validation::createValidator();
        $this->setConstraints();
    }

    private function setConstraints()
    {
        $this->constraints = [
            new Type(['type' => 'integer'])
        ];
    }

    public function validate($value)
    {
        return $this->validator->validate($value, $this->constraints);
    }
}

Nothing surprising there. I have a constraints collection which is simply "gotta be an integer". And a method that takes a value and validates it against those constraints.

And then some tests:

<?php

namespace me\adamcameron\general\test\validation\collectionIssue;

use me\adamcameron\general\validation\collectionIssue\IntegerValidator;
use PHPUnit\Framework\TestCase;

class IntegerValidatorTest extends TestCase
{
    use ViolationAssertions;

    private $validator;

    protected function setUp()
    {
        $this->validator = new IntegerValidator();
    }

    /** @dataProvider provideCasesForValidateTests */
    public function testValidate($validValue)
    {
        $violations = $this->validator->validate($validValue);
        $this->assertHasNoViolations($violations);
    }

    public function provideCasesForValidateTests()
    {
        return [
            'has value' => ['value' => 42],
            'is null' => ['value' => null]
        ];
    }

    /** @dataProvider provideCasesForViolationsTests */
    public function testValidateReturnsViolationsWhenConstraintsBroken($invalidValue)
    {
        $violations = $this->validator->validate($invalidValue);
        $this->assertHasViolations($violations);
    }

    public function provideCasesForViolationsTests()
    {
        return [
            'string' => ['value' => 'forty-two'],
            'integer string' => ['value' => '42'],
            'float' => ['value' => 4.2],
            'integer as float' => ['value' => 42.0],
            'array' => ['value' => [42]]
        ];
    }
}

So I have two tests: valid values; invalid values. I expect no violations from the valid values, and some violations from the invalid values (I don't care what they are, just that there's a violation when the value ain't legit):

PHPUnit 7.2.6 by Sebastian Bergmann and contributors.

.......                                                             7 / 7 (100%)

Time: 204 ms, Memory: 4.00MB

OK (7 tests, 14 assertions)

(Oh, I whipped a coupla custom assertions out into a trait):

namespace me\adamcameron\general\test\validation\collectionIssue;

use Symfony\Component\Validator\ConstraintViolationList;

trait ViolationAssertions
{
    public function assertHasNoViolations($violations)
    {
        $this->assertInstanceOf(ConstraintViolationList::class, $violations);
        $this->assertEmpty($violations);
    }

    public function assertHasViolations($violations)
    {
        $this->assertInstanceOf(ConstraintViolationList::class, $violations);
        $this->assertNotEmpty($violations);
    }
}


Bottom line, as you can see from the test results, one can pass an invalid value to a validator, and it won't error; it'll just return you yer violation. Cool.

So what's the go with a collection then? Why are we getting that error?

I concocted a similar CollectionValidator:

namespace me\adamcameron\general\validation\collectionIssue;

use Symfony\Component\Validator\Constraints\Collection;
use Symfony\Component\Validator\Constraints\Type;
use Symfony\Component\Validator\Validation;

class CollectionValidator
{

    private $validator;
    private $constraints;

    public function __construct()
    {
        $this->validator = Validation::createValidator();
        $this->setConstraints();
    }

    private function setConstraints()
    {
        $this->constraints = [
            new Collection([
                'fields' => [
                    'field1' => [new Type(['type'=>'integer'])]
                ],
                'allowMissingFields' => true,
                'allowExtraFields' => true
            ])
        ];
    }

    public function validate($value = null)
    {
        return $this->validator->validate($value, $this->constraints);
    }
}

This time I'm defining a "collection" (read: "array" in this case) which could have a property field1, and if it does; it needs to be an integer. It could have other properties too: don't care. Or indeed field1 could be missing: don't care. But if it's there... it needs to be an integer.

And we test this. I'll break the class down this time so I can discuss:

namespace me\adamcameron\general\test\validation\collectionIssue;

use me\adamcameron\general\validation\collectionIssue\CollectionValidator;
use PHPUnit\Framework\TestCase;

class CollectionValidatorTest extends TestCase
{
    use ViolationAssertions;

    private $validator;

    protected function setUp()
    {
        $this->validator = new CollectionValidator();
    }

    /** @dataProvider provideCasesForValidateEmptyCollectionTests */
    public function testValidateWithValidButEmptyCollectionTypeIsOk($validValue)
    {
        $violations = $this->validator->validate($validValue);
        $this->assertHasNoViolations($violations);
    }

    public function provideCasesForValidateEmptyCollectionTests()
    {
        return [
            'array' => ['value' => []],
            'iterator' => ['value' => new \ArrayIterator()]
        ];
    }
}


First just a baseline test with some empty (but valid) values. These pass:

PHPUnit 7.2.6 by Sebastian Bergmann and contributors.

..                                                                  2 / 2 (100%)

Time: 187 ms, Memory: 4.00MB

OK (2 tests, 4 assertions)


Next I test a fully-legit collection with a valid integer value for the field1 value.

public function testValidateWithIntegerField1ValueShouldPass()
{
    $violations = $this->validator->validate(['field1'=>42]);
    $this->assertHasNoViolations($violations);
}

All good:

PHPUnit 7.2.6 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 211 ms, Memory: 4.00MB

OK (1 test, 2 assertions)

And now with the correct field1 property, but without an integer. This should have violations

public function testValidateWithNonIntegerField1ValueShouldHaveViolation()
{
    $violations = $this->validator->validate(['field1'=>'NOT_AN_INTEGER']);
    $this->assertHasViolations($violations);
    $this->assertSame('This value should be of type integer.', $violations[0]->getMessage());
}

Results:

PHPUnit 7.2.6 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 191 ms, Memory: 4.00MB

OK (1 test, 3 assertions)


And now one without field1, but with another property:

public function testValidateWithOnlyOtherFieldsShouldPass()
{
    $violations = $this->validator->validate(['anotherField'=>'another value']);
    $this->assertHasNoViolations($violations);
}


PHPUnit 7.2.6 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 191 ms, Memory: 4.00MB

OK (1 test, 2 assertions)


And for good measure I chuck it a null and see what happens:

public function testValidateWithNullIsApparentlyOK()
{
    $violations = $this->validator->validate(null);
    $this->assertHasNoViolations($violations);
}


Not sure this is legit IMO - a null is not a collection as per my constraint definition. But... well... not what I care about right now. It passes:

PHPUnit 7.2.6 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 259 ms, Memory: 4.00MB

OK (1 test, 2 assertions)


OK so at this point I reckon I have done my constraints properly as my tests all pass.

And now the test that replicates our problem:

public function testValidateWithNonTraversableShouldCauseViolation()
{
    $violations = $this->validator->validate('a string');
    $this->assertHasViolations($violations);
}


Note that here I am not passing a collection value at all. I am just passing a string. So this should have a violation, right? Nuh-uh:

PHPUnit 7.2.6 by Sebastian Bergmann and contributors.

E                                                                   1 / 1 (100%)

Time: 194 ms, Memory: 4.00MB

There was 1 error:

1) me\adamcameron\general\test\validation\collectionIssue\CollectionValidatorTest::testValidateWithNonTraversableShouldCauseViolation
Symfony\Component\Validator\Exception\UnexpectedTypeException: Expected argument of type "array or Traversable and ArrayAccess", "string" given

C:\src\php\general\vendor\symfony\validator\Constraints\CollectionValidator.php:37
C:\src\php\general\vendor\symfony\validator\Validator\RecursiveContextualValidator.php:829
C:\src\php\general\vendor\symfony\validator\Validator\RecursiveContextualValidator.php:675
C:\src\php\general\vendor\symfony\validator\Validator\RecursiveContextualValidator.php:118
C:\src\php\general\vendor\symfony\validator\Validator\RecursiveValidator.php:100
C:\src\php\general\src\validation\collectionIssue\CollectionValidator.php:36
C:\src\php\general\test\validation\collectionIssue\CollectionValidatorTest.php:63

ERRORS!
Tests: 1, Assertions: 0, Errors: 1.


So let me get this straight. I am passing validation code a value that ought not validate. And what I get is not a "it dun't validate mate" violation, the validation lib just breaks.

Grumble.

Now this just seems to me to be a testing-101-fail on the part of the Validator library. One must always be able to give the validator any value, and the result should just be "yeah all good", or "nup. And here's why". However as the Symfony mob usually do such solid work, I am kinda wondering if I'm messing something up, and just not seeing it?

What do you think? Even if you don't know PHP or Symfony validation, what would your expectations be here? Or can yous ee the glaring error that I can't see from having looked at this too long?

I will presently ask a question on this on Stack Overflow (a heavily-abridged version of this), but wanted to get some more friendly eyes on it first.

Update

This was bugging my colleague so he went in and checked the underlying Symfony code - something I should have done - and the CollectionValidator is very specifically throwing this exception very early in the piece:

public function validate($value, Constraint $constraint)
{
    if (!$constraint instanceof Collection) {
        throw new UnexpectedTypeException($constraint, __NAMESPACE__.'\Collection');
    }
    if (null === $value) {
        return;
    }
    if (!is_array($value) && !($value instanceof \Traversable && $value instanceof \ArrayAccess)) {
        throw new UnexpectedTypeException($value, 'array or Traversable and ArrayAccess');
    }


That strikes me as very daft, and contrary to the intend of a validator lib. It's been there since the very first commit of that code, as far as I can tell (back in 2010). It's clearly their intent, but the intent is misguided IMO.

Another update

This has just bitten us on the butt again, and my mate Levi came across this very article when trying to troubleshoot it. Ha. However I didn't post an answer. I know we worked around it somehow... just trying to find the code now.

We found a coupla issues on the Symfony bug tracker:
But seems the former one was closed in favour of the latter one, and nothing's been done about the latter one. Sigh.

I'll post a follow-up article and cross link to here once we've worked around it.

Cheers.

--
Adam

Monday 7 May 2018

In defence of public properties

G'day:
Recently I was re-writing one of our cron job scripts. It's the sort of thing that runs at 2am, reads some stuff from the DB, analyses it, and emails some crap to interested parties. We usually try to avoid complete rewrites, but this code was written... erm... [cough insert some period of time ago when it was apparently OK to write really shit code with no code quality checks and no tests]... and it's unmaintainable, so we're biting the bullet and rewriting coherently whilst adding some business logic changes into it.

The details for the email - to, from, subject, data to build the body from - come from three different sources in total, before sending to the mailer service to be queued-up for sending. So I knocked together a simple class to aggregate that information together:

class SpecialInterestingReportEmail {

    public $from;
    public $to;
    public $subject;
    public $body;

    public static $bodyTemplate = "specialInterestingReportEmail.html.twig";
    
    function __construct($from, $to, $subject, $body) {
        $this->from = $from;
        $this->to = $to;
        $this->subject = $subject;
        $this->body = $body;
    }
}

I then have a factory method in an EmailMessageService (not entirely sure that's the best name for it, but [shrug]) which chucks all the bits and pieces together, and returns an email object:

function createSpecialInterestingReportEmail($special, $interesting) {
    $body = $this-twigService->render(
        SpecialInterestingReportEmail::bodyTemplate,
        [
            "special" => $special,
            "interesting" => $interesting
        ]
    )
    return new SpecialInterestingReportEmail(
        $this->addressesForReport->from,
        $this->addressesForReport->to,
        $this->reportSubject,
        $body
    );
}

I don't think this design is perfect, but in the context of what we're doing it's OK.

Notice:
All interaction I mention below with my colleagues is paraphrased, embellished, or outright changed to make my point. I am not quoting anyone, and take this as a work of fiction. It's also drawn from previous similar conversations I've had on this topic.

One of my code reviewers was horrified:

Code Reviewer: "You can't have public properties! That's bad OOP!"
Me: "Why?"
CR: "You're breaking encapsulation. You need to do it like this [proceeds to tutor me on how to write accessor methods, because, like, I needed a tutorial on design anti-patterns]".
Me: "Why? What are we gaining by making those properties private, just to force us to write a bunch of boilerplate code to then access them? And how is that a good change to make to this code that now exists and is fulfilling the requirement? This is code review... we've already done the planning for this. This class is just a single container to aggregate some data into a conceptual 'email message' object. It doesn't need any behaviour - it doesn't need any methods - it's just for moving data between services".
CR: "But you can't break encapsulation!"
Me: "Why not? Other than parroting dogma read from some OOP 101 book, why is this a problem? And how's it solved by accessor methods?"
CR: "Well someone could directly change the values to be wrong!"
Me: "Well... they probably shouldn't do that then. That'd be dumb. They could equally put the wrong values straight into the constructor too. It'd still be just as wrong. Look... we don't need to validate the data - that seems to be your concern here? - as it's just a script reading known-good values from the DB, and sending them to our email queue. The code's right there. There's no scope for the data to accidentally be wrongified. And if it was... the tests would pick it up anyhow".
CR: "But what if other code starts using this code?"
Me: "What... yer saying 'what it some other application starts using this cronjob as some sort of library?' Why would that happen? This is not a public API. It makes no pretence of being a public API. If anyone started using this as an API, they deserve everything they get".
CR: "But they might".
ME: "OK, let's say some of this code is gonna be needed somewhere else, and we need to make it into a library. At that point in time, we'd extract the relevant code, consider stuff like encapsulation, data hiding, providing a public interface etc. But that would be different code from this lot".
CR: "But you should write all code like it will be used that way".
Me: "Again: why? This code is not going to be used this way. It just isn't. And anyhow, what yer suggesting is YAGNI / coding-for-an-unknown-future anyhow. We don't gain anything for the purposes of the current requirement chucking pointless boilerplate code into these classes. That's not an improvement".

And it went on.

One problem I encounter fairly frequently - both at work and in the wild - is people who will read about some concept, and thenceforth That Concept Is Law. Someone has Said It, therefore it applies to every single situation thereafter. They don't ever seem to bother trying to reason why the "law" was stated in the first place, what about it makes it a good rule, and when perhaps it's not necessary.

I look at these things and think "will this benefit my current task?". Generally it does because these things don't acquire acceptance without scrutiny and sanity-checking. But sometimes, however, it doesn't make sense to follow the dogma.

In this situation: it does not help.

In a different situation, if I was writing a separate API which handled the Email object creation, and other apps were gonna use it, I'd've possibly tightened-up the property access. But only possibly. My position on such things is to be liberal with what one permits to be done with code. If all my theoretical accessor method was gonna achieve was to return a private value... I'd really consider just leaving it public instead, and allow direct access. Why not?

There's a risk that later I might need to better control access to those properties, but... I'd deal with that at the time: these things can be handled as/when. It's even easy enough to have transitionary code from direct-access to accessor-access using __get and __set. I know these are frowned-upon, but in transitionary situations: they're fine. So one could seamlessly patch the API for code already consuming it via direct access with that transitionary approach, and then in the next (or some subsequent ~) version, make the breaking change to require using accessors, eg:

v1.0 - direct access only.
v1.1 - add in transitionary code using __get and __set. Advise that direct access is deprecated and will be removed in the next release. Also add in accessor methods.
v2.0 - remove direct access.

It doesn't even need to be v2.0. Just "some later version". But for the sake of making sure the transitionary code is temporary, better to do sooner rather than later. The thing is that software versioning is there for a reason, so it's OK to only introduce necessary coding overhead when it's needed.

Another thing that occurred to me when I was thinking about this. Consider this code:

$email = [
    "from" => "from@example.com",
    "to" => "to@example.com",
    "subject" => "Example subject",
    "body" => "Example body"
];

$fromAddress = $email["from"];

Perfectly fine. So how come this code is not fine:

$email = new Email(
    "from@example.com",
    "to@example.com",
    "Example subject",
    "Example body"
);

$fromAddress = $email->from;

Why's it OK to access an array directly, but it's not - apparently - OK to give the collection of email bits a name (Email), and otherwise use it the same way?

I can't think of one.

Rules are great. Rules apply sensibly 95% of the time. But when one reads about a rule... don't just stop there... understand the rule. The rules are not there simply for the sake of enabling one to not then think about things. Always think about things.

Righto.

--
Adam

PS: completely happy to be schooled as to how I am completely wrong here. This is half the reason I wrote this.

Saturday 5 May 2018

PHP: perpetual misuse of array type declarations

G'day:
This has been irking me enough recently to cause me to dust off this old blog.

For a while PHP has had limited argument type checking on function arguments, and this was improved in PHP 7. The docs cover it - Function arguments › Type declarations - so I'll not repeat them here. But we can have this code:

function takesArray(array $a){
    return $a;
}

var_dump(takesArray(["tahi", "rua", "toru", "wha"]));

Output:

array(4) {
  [0]=>
  string(4) "tahi"
  [1]=>
  string(3) "rua"
  [2]=>
  string(4) "toru"
  [3]=>
  string(3) "wha"
}

And if we try to pass something other than an array:

<?php
function takesArray(array $a){
    return $a;
}

try {
    $a = takesArray("NOT_AN_ARRAY");
} catch (\Throwable $t) {
    echo $t;
}

We get this:

TypeError:
Argument 1 passed to takesArray() must be of the type array, string given,
called in [...][...] on line 7 and defined in [...][...]:2
Stack trace:
#0 [...][...](7): takesArray('NOT_AN_ARRAY')
#1 {main}

Fair enough. No complaint there.

What I have an issue with is that now I see a lot of devs slapping "array" as an parameter type in every method signature that takes an array. What's wrong with that you probably ask? Well... in most situations they're doing this, it's not simply an array that is the requirement. It's generally one of these two situations:

function getFirstName(array $name){
    return $name['firstName'];
}

$firstName = getFirstName(["lastName" => "Cameron", "firstName" => "Adam"]);

echo $firstName;

OK, this is a really contrived example, but it's simple and demonstrates the point. Here the function takes an array, and returns the firstName from the array. But note that the requirement of the argument is not "it's an array", as stated. It very specifically needs to be "an associative array that has a key firstName". "array" is not correct here. It's not at all helpful. If one is needing to do this sort of thing, then create a class Name, and specify that. Because that's what you actually want here:

class Name {
    public $lastName;
    public $firstName;
    
    function __construct($lastName, $firstName) {
        $this->lastName = $lastName;
        $this->firstName = $firstName;
    }
}

function getFirstName(Name $name){
    return $name->firstName;
}

The second bogus usage of array as a type declaration is with indexed arrays. Consider this:

function filterOutEarlierDates(array $dates, int $year){
    $cutOff = new \DateTime("$year-01-01");

    $filtered = array_filter($dates, function (\DateTimeInterface $date) use ($cutOff) {
        $diff = $date->diff($cutOff);
        return $diff->invert === 1;
    });
    
    return array_values($filtered);
}

$dobs = [
    new \DateTime("1968-12-20"),
    new \DateTime("1970-02-17"),
    new \DateTime("2011-03-24"),
    new \DateTime("2016-08-17")
];
$cutOff = 2011;

$youngins = filterOutEarlierDates($dobs, $cutOff);

var_dump($youngins);


Here filterOutEarlierDates claims it takes an array. Well it does. Again this lacks the necessary precision for the requirement. It doesn't take an array. It takes - specifically - an array of DateTimeInterface objects. If you give it anything else than that, the code breaks. As type checking is specifically to guard against that, specifying array is simply neither correct or even helpful here.

Specifying array as an argument type check does have a place. When what the function needs actually is an array. And just an array. Any old array.

function array_some(array $array, callable $callback) {
    foreach ($array as $key => $value) {
        $result = $callback($value, $key);
        if ($result) {
            return true;
        }
    }
    return false;
}

$colours = ["whero", "karaka", "kowhai", "kakariki", "kikorangi", "poropango", "papura"];

$containsSixLetterWords = array_some($colours, function ($colour) {
   return strlen($colour) === 6; 
});

var_dump($containsSixLetterWords);

$numbers = [
    "one" => "tahi",
    "two" => "rua",
    "three" => "toru",
    "four" => "wha",
    "five" => "rima",
    "six" => "ono",
    "seven" => "whitu",
    "eight" => "ware",
    "nine" => "iwa",
    "ten" => "tekau"
];

$hasShortKeys = array_some($numbers, function ($number, $key) {
   return strlen($key) <= 3; 
});

var_dump($hasShortKeys);

Here's a PHP equivalent of JS's some method. It takes an array (any array) and a callback, and applies the callback to each array entry until the callback returns true, then it exits. This function correctly claims the first argument value needs to be an array. Without any other sort of qualification: it just needs to be an array.

Actually there's a similar issue with my type checking of the $callback parameter there. It can't be any callable: it needs to be a callable that takes a coupla arguments and returns a boolean. In C# I guess we'd use a Delegate for that, but there's no such construct in PHP. So I should probably settle for giving the argument a clear name ($callback probably isn't good enough here), and dispense with the type check.

But even then I question the merits of type-checking array here. What if instead of just an array, we had a collection object that implements \Iterator, eg:

class ColourCollection implements \Iterator {
    private $position = 0;
    private $colours;  

    public function __construct($colours) {
        $this->position = 0;
        $this->colours = $colours;
    }

    public function rewind() {
        $this->position = 0;
    }

    public function current() {
        return $this->colours[$this->position];
    }

    public function key() {
        return $this->position;
    }

    public function next() {
        ++$this->position;
    }

    public function valid() {
        return isset($this->colours[$this->position]);
    }
}

$rainbow = new ColourCollection($colours);

If we run this variation of the code, we get:

Fatal error:
Uncaught TypeError: Argument 1 passed to array_some() must be of the type array, object given,
 called in [...][...] on line 48 and defined in [...][...]:2
Stack trace:
#0 [...][...](48): array_some(Object(ColourCollection), Object(Closure))
#1 {main}
  thrown in [...][...] on line 2

If we take the type check out: the code works. So how is the array type check really helping us here? It's not.

To be honest I think I'm being a bit edge-case-y with that last example. I don't see a lot of objects that implement the iterator class: we just stick with arrays instead. But it is a legit consideration I guess.

I would use array as a type check in that array_some situation - where the argument actually just needs to be an array, with no special other qualities about it. But this is very rare. In other cases where the argument value needs to be "some specific type or array", then I think it's dead wrong to have an array type check there.

Righto.

--
Adam

Tuesday 28 November 2017

That array_map quandary implemented in other languages

G'day:
A coupla days ago I bleated about array_map [having] a dumb implementation. I had what I thought was an obvious application for array_map in PHP, but it couldn't really accommodate me due to array_map not exposing the array's keys to the callback, and then messing up the keys in the mapped array if one passes array_map more than one array to process.

I needed to remap this:

[
    "2008-11-08" => "Jacinda",
    "1990-10-27" => "Bill",
    "2014-09-20" => "James",
    "1979-05-24" => "Winston"
]

To this:

array(4) {
  '2008-11-08' =>
  class IndexedPerson#3 (2) {
    public $date =>
    string(10) "2008-11-08"
    public $name =>
    string(7) "Jacinda"
  }
  '1990-10-27' =>
  class IndexedPerson#4 (2) {
    public $date =>
    string(10) "1990-10-27"
    public $name =>
    string(4) "Bill"
  }
  '2014-09-20' =>
  class IndexedPerson#5 (2) {
    public $date =>
    string(10) "2014-09-20"
    public $name =>
    string(5) "James"
  }
  '1979-05-24' =>
  class IndexedPerson#6 (2) {
    public $date =>
    string(10) "1979-05-24"
    public $name =>
    string(7) "Winston"
  }
}

Note how the remapped object also contains the original key value. That was the sticking point. Go read the article for more detail and more whining.

OK so my expectations of PHP's array higher order functions are based  on  my experience with JS's and CFML's equivalents. Both of which receive the key as well as the value in all callbacks. I decided to see how other languages achieve the same end, and I'll pop the codee in here for shits 'n' giggles.


CFML

Given most of my history is as a CFML dev, that one was easy.

peopleData = ["2008-11-08" = "Jacinda", "1990-10-27" = "Bill", "2014-09-20" = "James", "1979-05-24" = "Winston"]

people = peopleData.map((date, name) => new IndexedPerson(date, name))

people.each((date, person) => echo("#date# => #person#<br>"))

Oh, this presupposes the IndexedPerson component. Due to a shortcoming of how CFML works, components must be declared in a file of their own:

component {

    function init(date, name) {
        this.date = date
        this.name = name
    }

    string function _toString() {
        return "{date:#this.date#; name: #this.name#}"
    }
}


But the key bit is the mapping operation:

people = peopleData.map((date, name) => new IndexedPerson(date, name))

Couldn't be simpler (NB: this is Lucee's CFML implementation, not ColdFusion's which does not yet support arrow functions).

The output is:


2008-11-08 => {date:2008-11-08; name: Jacinda}
1990-10-27 => {date:1990-10-27; name: Bill}
2014-09-20 => {date:2014-09-20; name: James}
1979-05-24 => {date:1979-05-24; name: Winston}

Also note that CFML doesn't have associative arrays, it has structs, so the keys are not ordered. This does not matter here. (Thanks to Zac for correcting me here: CFML does have ordered structs these days).


JS

The next language I turned to was JS as that's the I'm next most familiar with. One thing that hadn't occurred to me is that whilst JS's Array implementation has a map method, we need to use an object here as the keys are values not indexes. And whilst I knew Objects didn't have a map method, I didn't know what the equivalent might be.

Well it turns out that there's no real option to use a map here, so I needed to do a reduce on the object's entries, Still: it's pretty terse and obvious:

class IndexedPerson {
    constructor(date, name) {
        this.date = date
        this.name = name
    }
}

let peopleData = {"2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"}

let people = Object.entries(peopleData).reduce(function (people, personData) {
    people.set(personData[0], new IndexedPerson(personData[0], personData[1]))
    return people
}, new Map())

console.log(people)

This returns what we want:

Map {
  '2008-11-08' => IndexedPerson { date: '2008-11-08', name: 'Jacinda' },
  '1990-10-27' => IndexedPerson { date: '1990-10-27', name: 'Bill' },
  '2014-09-20' => IndexedPerson { date: '2014-09-20', name: 'James' },
  '1979-05-24' => IndexedPerson { date: '1979-05-24', name: 'Winston' } }

TBH I think this is a misuse of an object to contain basically an associative array / struct, but so be it. It's the closest analogy to the PHP requirement. I was able to at least return it as a Map, which I think is better. I tried to have the incoming personData as a map, but the Map prototype's equivalent of entries() used above is unhelpful in that it returns an Iterator, and the prototype for Iterator is a bit spartan.

I think it's slightly clumsy I need to access the entries value via array notation instead of some sort of name, but this is minor.

As with all my code, I welcome people showing me how I should actually be doing this. Post a comment. I'm looking at you Ryan Guill ;-)

Java

Next up was Java. Holy fuck what a morass of boilterplate nonsense I needed to perform this simple operation in Java. Deep breath...

import java.util.HashMap;
import java.util.Map;
import java.util.stream.Collectors;

class IndexedPerson {
    String date;
    String name;
    
    public IndexedPerson(String date, String name) {
        this.date = date;
        this.name = name;
    }
    
    public String toString(){
        return String.format("{date: %s, name: %s}", this.date, this.name);
    }
}

class Collect {

    public static void main(String[] args) {

        HashMap<String,String> peopleData = loadData();

        HashMap<String, IndexedPerson> people = mapToPeople(peopleData);
            
        dumpIdents(people);
    }
    
    private static HashMap<String,String> loadData(){
        HashMap<String,String> peopleData = new HashMap<String,String>();
        
        peopleData.put("2008-11-08", "Jacinda");
        peopleData.put("1990-10-27", "Bill");
        peopleData.put("2014-09-20", "James");
        peopleData.put("1979-05-24", "Winston");
        
        return peopleData;
    }
    
    private static HashMap<String,IndexedPerson> mapToPeople(HashMap<String,String> peopleData) {
        HashMap<String, IndexedPerson> people = (HashMap<String, IndexedPerson>) peopleData.entrySet().stream()
            .collect(Collectors.toMap(
                e -> e.getKey(),
                e -> new IndexedPerson(e.getKey(), e.getValue())
            ));
            
        return people;
    }
    
    private static void dumpIdents(HashMap<String,IndexedPerson> people) {
        for (Map.Entry<String, IndexedPerson> entry : people.entrySet()) {
            System.out.println(String.format("%s => %s", entry.getKey(), entry.getValue()));
        }
    }
    
}

Result:
1979-05-24 => {date: 1979-05-24, name: Winston}
2014-09-20 => {date: 2014-09-20, name: James}
1990-10-27 => {date: 1990-10-27, name: Bill}
2008-11-08 => {date: 2008-11-08, name: Jacinda}

Most of that lot seems to be just messing around telling Java what types everything are. Bleah.

The interesting bit - my grasp of which is tenuous - is the Collectors.toMap. I have to admit I derived that from reading various Stack Overflow articles. But I got it working, and I know the general approach now, so that's good.

Too much code for such a simple thing though, eh?


Groovy

Groovy is my antidote to Java. Groovy makes this shit easy:

class IndexedPerson {
    String date
    String name

    IndexedPerson(String date, String name) {
        this.date = date;
        this.name = name;
    }

    String toString(){
        String.format("date: %s, name: %s", this.date, this.name)
    }
}

peopleData = ["2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"]

people = peopleData.collectEntries {date, name -> [date, new IndexedPerson(date, name)]}

people.each {date, person -> println String.format("%s => {%s}", date, person)}

Bear in mind that most of that is getting the class defined, and the output. The bit that does the mapping is just the one line in the middle. That's more like it.

Again, I don't know much about Groovy… I had to RTFM to find out how to do the collectEntries bit, but it was easy to find and easy to understand.

I really wish I had a job doing Groovy.

Oh yeah, for the sake of completeness, the output was thus:

2008-11-08 => {date: 2008-11-08, name: Jacinda}
1990-10-27 => {date: 1990-10-27, name: Bill}
2014-09-20 => {date: 2014-09-20, name: James}
1979-05-24 => {date: 1979-05-24, name: Winston}


Ruby

Ruby's version was pretty simple too as it turns out. No surprise there as Ruby's all about higher order functions and applying blocks to collections and stuff like that.

class IndexedPerson

    def initialize(date, name)
        @date = date
        @name = name
    end

    def inspect
        "{date:#{@date}; name: #{@name}}\n"
    end
end

peopleData = {"2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"}

people = peopleData.merge(peopleData) do |date, name|
    IndexedPerson.new(date, name)
end

puts people

Predictable output:

{"2008-11-08"=>{date:2008-11-08; name: Jacinda}
, "1990-10-27"=>{date:1990-10-27; name: Bill}
, "2014-09-20"=>{date:2014-09-20; name: James}
, "1979-05-24"=>{date:1979-05-24; name: Winston}
}

I wasn't too sure about all that block nonsense when I first started looking at Ruby, but I quite like it now. It's easy to read.


Python

My Python skills don't extend much beyond printing G'day World on the screen, but it was surprisingly easy to google-up how to do this. And I finally got to see what Python folk are on about with this "comprehensions" stuff, which I think is quite cool.

class IndexedPerson:
    def __init__(self, date, name):
        self.date = date
        self.name = name

    def __repr__(self):
        return "{{date: {date}, name: {name}}}".format(date=self.date, name=self.name)

people_data = {"2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"}

people = {date: IndexedPerson(date, name) for (date, name) in people_data.items()}

print("\n".join(['%s => %s' % (date, person) for (date, person) in people.items()]))


And now that I am all about Clean Code, I kinda get the "whitespace as indentation" thing too. It's clear enough if yer code is clean in the first place.

The output of this is identical to the Groovy one.

Only one more then I'll stop.

Clojure

I can only barely do G'day World in Clojure, so this took me a while to work out. I also find the Clojure docs to be pretty impentrable. I'm sure they're great if one already knows what one is doing, but I found them pretty inaccessible from the perspective of a n00b. It's like if the PHP docs were solely the user-added stuff at the bottom of each docs page. Most blog articles I saw about Clojure were pretty much just direct regurgitation of the docs, without much value-add, if I'm to be honest.

(defrecord IndexedPerson [date name])

(def people-data (array-map "2008-11-08" "Jacinda" "1990-10-27" "Bill" "2014-09-20" "James" "1979-05-24" "Winston"))

(def people
  (reduce-kv
    (fn [people date name] (conj people (array-map date (IndexedPerson. date name))))
    (array-map)
    people-data))

(print people)

The other thing with Clojure for me is that the code is so alien-looking to me that I can't work out how to indent stuff to make the code clearer. All the examples I've seen don't seem very clear, and the indentation doesn't help either, I think. I guess with more practise it would come to me.

It seems pretty powerful though, cos there's mot much code there to achieve the desired end-goal.

Output for this one:

{2008-11-08 #user.IndexedPerson{:date 2008-11-08, :name Jacinda},
1990-10-27 #user.IndexedPerson{:date 1990-10-27, :name Bill},
2014-09-20 #user.IndexedPerson{:date 2014-09-20, :name James},
1979-05-24 #user.IndexedPerson{:date 1979-05-24, :name Winston}}


Summary

This was actually a very interesting exercise for me, and I learned stuff about all the languages concerned. Even PHP and CFML.

I twitterised a comment regarding how pleasing I found each solution:


This was before I did the Clojure one, and I'd slot that in afer CFML and before JS, making the list:
  1. Python
  2. Ruby
  3. Groovy
  4. CFML
  5. Clojure
  6. JS
  7. PHP
  8. Java

Python's code looks nice and it was easy to find out what to do. Same with Ruby, just not quite so much. And, really same with Groovy. I could order those three any way. I think Python tips the scales slightly with the comprehensions.

CFML came out suprisingly well in this, as it's a bloody easy exercise to achieve with it.

Clojure's fine, just a pain in the arse to understand what's going on, and the code looks a mess to me. But it does a lot in little space.

JS was disappointing because it wasn't nearly so easy as I expected it to be.

PHP is a mess.

And - fuck me - Java. Jesus.

My occasional reader Barry O'Sullivan volunteered some input the other day:


Hopefully he's still up for this, and I'll add it to the list so we can have a look at that code too.

Like I said before, if you know a better or more interesting way to do this in any of the languages above, or any other languages, make a comment and post a link to a Gist (just don't put the code inline in the comment please; it will not render at all well).

I might have another one of these exercises to do soon with another puzzle a friend of mine had to recently endure in a job-interview-related coding test. We'll see.

Righto.

--
Adam