Sunday, 23 November 2014

Weekend code puzzle: Dave's answer (Python)

G'day:
I'm continuing to look at each person's submissions for the code puzzle ("Something for the weekend? A wee code puzzle (in CFML, PHP, anything really...)").

Dave's done a Python version. Like Chris just before him, Dave got his answer in before I varied the rules slightly, so his answer just finds the first longest subseries within the threshold from within the series; it does not check same-lengthed subseries for which has the highest within-threshold total. "Within" three times in a sentence. Sorry about that.



I'm also gonna do Dave a slight disservice for the sake of making a point. The initial version of Dave's code - the version I was testing on Fri evening when I started planning this article - had a logic error in it. Later on Friday or perhaps Sat, Dave submitted a new version of the code which was addressing another issue, but it inadvertantly (?) also fixed the logic issue I found. I'm gonna look at the initial version first, and then look at the updated version.

Here's the code (first version):

# daveWhite.py
def subSeries(series,threshold):
    currentTotal = 0
    counter = 0
    tempSeries = []
    longestSeries = []
    while len(series) > 1:
        currentTotal += series[counter]
        if currentTotal > threshold:
            if len(tempSeries) > len(longestSeries):
                longestSeries = tempSeries
            series.pop(0)
            tempSeries = []
            counter = 0
            currentTotal = 0
        else:
            tempSeries.append(series[counter])
            counter += 1
        if counter >= len(series):
            break
    return longestSeries

The only Python I had done in the past was to test some lunacy I'd perceived in PHP. I wasn't completely sure it was lunacy (spoiler: it was), so tested the logic with some other languages ("PHP: include paths are relative to the current working directory"). This is really the first time I've even looked at Python code. I've always had reservations about the "whitespace is meaningful" thing Python does, and I can't say I'm really warming to it. Having played around with it a bit now, it's not "just completely bloody stupid" like I thought I would find it, but I'm still... a bit "meh" about the whole idea. I'd rather it had an "end" or a "}" at the end of logic blocks.

Anyway - from this small snippet of code anyhow - it's reasonably easy to follow, which is something I guess. I did find it odd that to append to a list (Python it seems calls arrays "lists". OK) one uses an .append() (the Python docs seem to be a bit shit: they are class-centric, not method-centric, so you'll just need to scroll down to .append()) method like one might expect. But to get the length of a list, one uses the len() headless function. Weird. I do like how one can pop() from either end (or anywhere within) of the arraylist though: pop(0) removes and returns the first element of the list. This makes more sense than having pop() for removing from the high-end, and shift() to remove from the low end (ie: like in JS). Nothing else in that code stands out to me (Python-wise, I mean).

Dave had a test to prove exactly the code puzzle requirement, which wasn't enough for me, so I did some RTFM and worked out how to write a series of tests to run on the function. I could not be arsed looking into what unit testing framework Python might offer... and I was using dodgy airport wifi anyhow so my googling / installing options were limited, so just hand rolled some stuff.

Here's an approach to running some tests which takes a list of dictionaries (read "struct"), and each dictionary specifies the series to test on, the threshold to use, the expected subseries to return and optionally an additional test to run using an inline lambda function:

# tests.py

def getTests():
    return [{
        'series':[100,300,100,50,50,50,50,50,500,200,100],
        'threshold':500,
        'expected':[100,50,50,50,50,50],
        'message':"Should return an array",
        'test': lambda subseries, threshold: sum(subseries) < threshold
    },{
        'series':[600,100,100,100,600],
        'threshold':500,
        'expected':[100,100,100],
        'message':"A multi-element series should have been found",
        'test': lambda subseries, threshold: len(subseries) > 1
    },{
        'series':[600,100,100,100,600,100,100,100,100,600],
        'threshold':500,
        'expected':[100,100,100,100],
        'message':"A subsequent larger multi-element series should have been found"
    }
    # a number of tests have been removed as they make for boring reading
    ]

This just demonstrates that lists and dictionaries - at least at a superficial syntactical level - are fairly familiar. The key names in a dictionary need to be quoted, and the value is associated via a : not an = (although CFML now supports : as well, obviously).

The "interesting" thing is that Python has lambda expressions, which are like very very cut down function expressions from CFML. They can only have an expression in them, they can contain no statements. This seems really primitive compared to CFML. Still: for my purposes here, that's OK.

The full list of tests numbers 15 (the link in the filename goes to the file on Github, with all of them). They're the same as with the PHP and CFML examples, with a coupla additions I made when testing Dave's code.

I run these tests thus:

# getSubseriesTest.py

import json

from tests import getTests

from daveWhite import subSeries as getSubseries

tests = getTests()


for test in tests:
    result = getSubseries(test['series'], test['threshold'])
    print(test['message'])
    if result == test['expected']:
        print("expected OK")
    else:
        print("*** expected MISMATCH. Expected: " + json.dumps(test['expected']) + "; result: " + json.dumps(result))

    if 'test' in test:
        if test['test'](result, test['threshold']):
            print("test OK")
        else:
            print("*** test FAILED")
    print("\n================================\n\n")


There's little of note here. I'm not bothering with assertions, I just output the results for each test. I am guessing the ".dumps()" method is short for "dump string", and I think it's sloppy method naming: it's doing a serialisation, not a "dump", really. And if they do mean "string", they should say it. "s"? Not clean code. It sounds like the plural of the act of taking a poo, if you ask me.

Here's the result:

Should return an array
expected OK
test OK
================================
A multi-element series should have been found
expected OK
test OK
================================
A subsequent larger multi-element series should have been found
expected OK
================================
A longer adjacent subseries should be found
expected OK
================================
Should work with an empty series
expected OK
================================
Should work when threshold is lower than all items
expected OK
================================
Should work when threshold is higher than every item
*** expected MISMATCH. Expected: [90]; result: [50]
================================
Should work when threshold is higher than total of all items
*** expected MISMATCH. Expected: [50, 60, 70, 80, 90]; result: []
================================
Should work when the subseries is equal to the threshold and is at the beginning of the series
expected OK
================================
works when the subseries is equal to the threshold and at the end of the series
*** expected MISMATCH. Expected: [100, 100, 100, 100, 100, 100]; result: [150, 150, 100, 100, 100]
================================
Should work when the subseries is the first element
expected OK
================================
Should work when the subseries is the last element
*** expected MISMATCH. Expected: [100]; result: []
================================
Should work as per quiz requirements
expected OK
================================
Should return subseries with highest tally when more than one have the same length (last series is higher)
*** expected MISMATCH. Expected: [100, 60, 60, 60, 60, 60]; result: [100, 50, 50, 50, 50, 50]
================================
Should return subseries with highest tally when more than one have the same length (first series is higher)
expected OK
================================

There are a couple of failures here which are legit: Dave's code was never written with the rule regarding returning the longest subseries with the highest total value within the threshold, he's just returning based on the longest match. So these are fine.

However there are three tests which also don't pass:

Should work when threshold is higher than total of all items
*** expected MISMATCH. Expected: [50, 60, 70, 80, 90]; result: []
================================
works when the subseries is equal to the threshold and at the end of the series
*** expected MISMATCH. Expected: [100, 100, 100, 100, 100, 100]; result: [150, 150, 100, 100, 100]
================================
Should work when the subseries is the last element
*** expected MISMATCH. Expected: [100]; result: []
================================

Now I'm gonna ignore the latter two tests as I added them in reaction to Dave's updated code, initially the only additional failure I had was when expecting [50,60,70,80,90] I was only getting [].

Superficially - ie, relying on eyeballing, not actual testing - Dave's logic seems sound. But it clearly wasn't so I decided to desk check his logic. Basically I worked through the logic in my head, recording what various variables would hold at given points in the program execution. This is what I got (follow it through yourself as well):

counter: 0
series: [50,60,70,80,90]
currentTotal: 50
tempSeries: [50]
counter: 1
====================
currentTotal: 110
tempSeries: [50,60]
counter: 2
====================
currentTotal: 180
tempSeries: [50,60,70]
counter: 3
====================
currentTotal: 260
tempSeries: [50,60,70,80]
counter: 4
====================
currentTotal: 350
tempSeries: [50,60,70,80,90]
counter: 5
break


So he's built the potential subseries up correctly but then we get to the exit situation:

if counter >= len(series):
    break

At which point we return longestSeries. However at no point yet have we actually set a longestSeries! Huh? Because longestSeries only gets set in here:

if currentTotal > threshold:
    if len(tempSeries) > len(longestSeries):
        longestSeries = tempSeries

Most of the time this is OK: we know we're at the end of a potential subseries when the total would go over the threshold. But this is missing one rule: or if we get to the end of the series completely. Which is the case here. Oops. Another good demonstration of where boundary testing is important, and why I added those other two tests in. Boundary testing are not just "off by one" situations, but also at the limits of data sets and stuff like that. Remember that.

Anyway, Dave's subsequently updated his logic to cover situations where the series only has one element, and in the process he's fixed this issue too:

# daveWhiteUpdated.py

def subSeries(series,threshold):
    currentTotal = 0
    counter = 0
    tempSeries = []
    longestSeries = []
    while len(series) > 0:
        currentTotal += series[counter]
        if currentTotal > threshold:
            if len(tempSeries) > len(longestSeries):
                longestSeries = tempSeries
            series.pop(0)
            tempSeries = []
            counter = 0
            currentTotal = 0
        else:
            tempSeries.append(series[counter])
            counter += 1
        if counter >= len(series):
            if len(tempSeries) > len(longestSeries):
                longestSeries = tempSeries
            break
    return longestSeries

This now passes all the tests other than the ones where the total of the subseries is relevant too.

OK, so that's some Python. I've done my own solution in Python as well as I wanted to look at lambdas some more, but I'll do that in a separate article.

Next up I need to wade through Adam Presley's solution, written in Go. I'm looking forward to checking-out this one, as I have never even seen any Go code, let alone even done "G'day World" in it.

Righto.

--
Adam