Saturday 11 October 2014

PHP: Inspired by Duncan: palindromes via generators

G'day:
My mate Dunc is making a habit of being my muse @ the moment. I guess it's cos we're both learning PHP at the same time, and he's blogging about it as well. This morning he has a new article following his foray into translating his CFML (and Python occasionally) knowledge into PHP: "Project Euler: problem 36 (PHP)".

The Project Euler exercises are just a series of algorithmic problems which one can work through. I'm not signed up to the site, but I guess one can also submit answers or something. I'm just basically following along as Duncan does the hard yards.

Problem 36 is as follows:

The decimal number, 585 = 10010010012 (binary), is palindromic in both bases.

Find the sum of all numbers, less than one million, which are palindromic in base 10 and base 2.

(Please note that the palindromic number, in either base, may not include leading zeros.)

Duncan's taken a fairly brute force approach: looping to 1000000 and cherry-picking the results that are a) decimal palindromes, and then also b) binary ones. He outputs them, and tallies them up. Hopefully he doesn't mind me reproducing his code:

<?php
$sum = 0;

function isPalindrome($x)
{
    return $x == strrev($x);
}

for ($i = 1;$i <= 999999; $i++) {
    if (isPalindrome($i) && isPalindrome(decbin($i))) {
        echo $i . " " . decbin($i) . "<br>";
        $sum += $i;
    }
}

echo $sum;

This solution is short, and it works. So that's a tick in the box.

However I have a real problem with loops that actually don't do anything for most of their iterations. For example here, there's only 19 of the iterations we're interested in: the rest we exclude. So 99.99% - literally! (to two decimal places anyhow) - of that loop is a waste of time.

We had a more egregious example of this at work the other day. We had an array of potential matches, and we looped over them to find which one we wanted, then exited. That is not an exercise for a loop. If one is doing a loop, then I feel the code in the loop ought to apply to say 80% (80/20 rule picked arbitrarily here) of the iterations. For the work exercise the answer was to apply a filter to the array to get the one we wanted, then just use it.

Anyway, I had a think about the exercise at hand. We don't need to check all million numbers: we already know that the only numbers that match are palindromic ones. And we know how a palindrome is formed, so we can automate this. I also thought this is a dead sitter for a generator, so I decided to write a palindrome generator.

My approach to creating palindromes is as follows:

  • count from zero
  • for each number, two palindromes can be inferred:
    • simply reflect the number itself, eg: 123 => 123321
    • similar, but pivot around the last digit, eg: 123 => 12321
  • but those two options will not be adjacent to each other in the sequence (eg: 12421 and many others will come between 12321 and 123321), so whilst the shorter pivoted one could be the next in the sequence, the reflected ones likely won't be, so we store those ones in an overflow array
  • we check to see if the current pivoted palindrome is actually greater than any in the overflow array, and if it is, we drain the overflow of any palindromes that are less than the pivoted one
  • then we use the pivoted one
Here's a desk check of that logic to demonstrate what I mean. I'm going to start from 10, as it's clearer sooner. In this example the columns are current digit: pivoted palindrome; overflow array:

10: 101; 1001
11: 111; 1001, 1111
12: 121; 1001, 1111, 1221
13: 131; 1001, 1111, 1221, 1331
14: 141; 1001, 1111, 1221, 1331, 1441
15: 151; 1001, 1111, 1221, 1331, 1441, 1551
16: 161; 1001, 1111, 1221, 1331, 1441, 1551, 1661
17: 171; 1001, 1111, 1221, 1331, 1441, 1551, 1661, 1771
18: 181; 1001, 1111, 1221, 1331, 1441, 1551, 1661, 1771, 1881
19: 191; 1001, 1111, 1221, 1331, 1441, 1551, 1661, 1771, 1881, 1991
[...]
99: 999: [lots], 9889, 9999
100: 10001 which is greater than everything in the overflow, so we need to drain those first
101: 10101; 101101
[etc]


So here's the code for the generator:

// Palindrome.class.php

class Palindrome {
    public static function createPalindromeSequence($max=-1){
        $n = 0;
        $reserveStack = [];
        while (true){
            $shortPalindrome = $n . substr(strrev($n),1);

            while (count($reserveStack) && $shortPalindrome > $reserveStack[0]){
                yield array_shift($reserveStack);
            }
            if ($max != -1 && $shortPalindrome > $max) return;
            yield $shortPalindrome;
            $n++;
            $reserveStack[] = $n . strrev($n);
        }
    }

}

Now we just use that as the source of the loop, and away we go:

// euler36.php
$sum = 0;
foreach(Palindrome::createPalindromeSequence(1000000) as $x){
    $asBinary = decbin($x);
    if (Palindrome::isPalindrome($asBinary)){    
        echo $x . " " . $asBinary . "<br>";
        $sum += $x;
    }
}
echo $sum . "<hr>";

BTW, isPalindrome() is just this:

public static function isPalindrome($x){
    return $x == strrev($x);
}

And that's fine, that works:

0 0
1 1
3 11
5 101
7 111
9 1001
33 100001
99 1100011
313 100111001
585 1001001001
717 1011001101
7447 1110100010111
9009 10001100110001
15351 11101111110111
32223 111110111011111
39993 1001110000111001
53235 1100111111110011
53835 1101001001001011
73737 10010000000001001
585585 10001110111101110001
872187


But I'm not happy with it. The question here is to take the series of palindromes, and derive a sum of them (based on a secondary rule). To me this is a reduction process (or a filter and a sum, but I chose the reduction approach). I like to avoid generic loops where there's a more appropriate kind of iterative process to use. So I refactored this code to use array_reduce():

// euler36_array_reduce.php
$sum = array_reduce($decPalindromes, function($reduction, $current){
    $asBinary = decbin($current);
    if (Palindrome::isPalindrome($asBinary)){    
        echo $current . " " . $asBinary . "<br>";
        $reduction += $current;
    }
    return $reduction;
}, 0);

I think that logic is more clear.

But... performance? Dunc's code is far more simple, and that often results in better performance. I don't have much of an idea how generators perform, and I also suspect array_reduce() is not the fastest approach to things. I added a test harness around each example and had a loop at the timing:

$start = microtime(true);


// code here


$end = microtime(true);

printf("Execution time: %dms", ($end - $start) * 1000);


Typical performance is as follows:

Duncan's code: 871ms
My code, first version: 17ms
My code with array_reduce(): 19ms

(I did a lot of runs, and they were always like that or thereabouts).

OK, so I think Duncan's being penalised for his brute force approach here. Running that condition a million times adds up. I think if the situation was such that there was more like 999981 valid items in the iteration, his approach would make sense. But given there's only utlimately 19... it's too much work to be doing to get the result.

Also it's cool there's not much overhead in using array_reduce(), so there's no reason to not use that approach.

Oh, one footnote: Duncan excludes 0 from his version, and I don't: zero is as much a palindrome as digits one, two three, etc are. I think he is being too literal about the "leading zeros" in the instructions. I read that as an indicator that something like "020" is not a valid answer. The 0 in zero is not a "leading zero", IMO. It does not impact the bottom line though.

And another one. I'm not that happy with the logic in my palindrome generator: it's needing to back-up far too many entries in the "overflow" for my liking. I think there must be a better approach here. Lemme know if you come up with a better way!

Is it odd that I found that quite a fun exercise for a Saturday morning, all before I had a coffee, or even got out of bed?

Righto.

--
Adam