Tuesday, 16 August 2016

Looking at the code from that puzzle last week

G'day:
So last Fri I asked this:

[...] here's a code puzzle.

Rules:
  • You have an array of strings, eg: ['a', 'at', 'cat', 'scat', 'catch'].
  • Return the first value that matches a regex pattern, eg: '.+at' would match cat, scat, catch; but we want cat returned.
  • Do not use any looping statements (eg: do/for/while etc).
  • Bear in mind functions are not statements ;-)
  • The array could be very long, the strings could be very long, and the pattern could be complex. But the desired value could be early in the array.
  • Use any language.

I didn't have much of a reason for asking. I had to do something similar at work, except the match criteria were more complicated, and I tried a few ways and didn't like any of them. When processing a "plural" data structure - and array or a struct or some other sort of collection - I like to avoid generic looping statements if I can, as I don't think they're terribly declarative. If one needs to somehow transform a collection into something else, I prefer to use collection-iteration methods or functions like map, reduce, filter etc. I initially did this sort of thing (JavaScript):

var words = ["a", "at", "cat", "scat", "catch"];

var match = words.reduce(function(match, word){
    if (match !== undefined) return match;
    if (word.match(/.+at/)){
        return word;
    }
}, undefined);

console.log(match);

That's all good in theory. If one is taking a collection and the data processing of it returns a single (different) value, then reduce makes sense. The problem is that reduce iterates over the entire collection whether one needs to or not, so after I've found "cat" I'm just wasting effort looking at "scat" and "catch". Now I'm not one to worry about wasting cycles and that sort of micro-optimisation, but it still didn't sit well with me.

So next I considered using the "early exit" iteration method, some. I can stop that when I've finished. The problem is that some returns a boolean. And I needed a cat. But I could solve that with some closure:

var match;
words.some(function(word){
    if (word.match(/.+at/)){
        return match = word;
    }
});
console.log(match);

Note that return statement contains an assignment, and that evaluates to true, thus exiting the collection iteration (some ends as soon as the callback returns true).

That's all well and good, except I actually needed this done in PHP and it doesn't have a some function.

In the end I just used a foreach loop and was done with it:

$words = ["a", "at", "cat", "scat", "catch"];

$match = null;
foreach($words as $word) {
    if (preg_match("/.+at/", $word) === 1) {
        $match = $word;
        break;
    }
}

echo $match;

Still: all this got me thinking it was an interesting exercise. Well: kinda. So posted the puzzle to see what people came up with.

I added the constraint of no looping statements to push people towards using something more interesting. It was an artificial constraint.

Anyway, enough about me. Let's have a look at other people's code.

Isaiah

First up was Isaiah with a Ruby answer. This is short and sweet:

puts %w(a at cat scat catch).detect{|w| w =~ /.+at/ }

(I added the puts so I could check it output something)

detect is exactly what I needed for this. As per the detect docs:

Passes each entry in enum to block. Returns the first for which block is not false.
So like a combination of some and reduce, really. Cool. This answer is gonna be hard to beat (NB: I am only looking at the answers for the first time now!).

Brad

Next was Brad with a CFML answer:

echo( ['a', 'at', 'cat', 'scat', 'catch'].filter( function( i ) { return reFind( '.+at', i ) } ).first() )

This one only runs on Lucee. This slight revision works on CF2016 as well:

words = ['a', 'at', 'cat', 'scat', 'catch'];
writeOutput( words.filter( function( i ) { return reFind( '.+at', i ); } )[1] );

Brad's answer falls into the same trap my reduce version did: it keeps iterating after it could stop.

I do like how terse Brad's answer is here. Although it's borderline (borderline) unreadable. It demonstrates CFML can do some good stuff though.

Jesse

Jesse's JavaScript solution is next, once again going for the terseness prize:

console.log(['a', 'at', 'cat', 'scat', 'catch'].find((e)=>/.+at/.test(e)));

(again... the console.log is my addition).

I didn't know about the find method I have to admit, so that's cool. This is the equivalent of Isaiah's Ruby answer.

Bonus points for using an arrow function there!

Tyler

Tyler has done another JavaScript solution, this time using filter:

function getFirstArrayRegexMatch(array, regex){
  return (
    array.filter(function(element){
      if(element.match(regex)){ return element; }
    })[0]
  );
}

var a = ["a","at","cat","scat","catch"]
,   r = /.+at/;

document.write(getFirstArrayRegexMatch(a, r));

Bonus point for putting it in a function.

Ryan

Ryan's given me two JavaScript answers: one short-hand, one long-hand with a bunch of testing.

var firstMatch = (input, pattern) => input.find(item => item.match(pattern));

console.log(firstMatch(['a', 'at', 'cat', 'scat', 'catch'], '.+at'));

Similar to Jesse's answer.

function f (input, pattern) {
    var reverse = str => str.split('').reverse().join('');
    var reversedIndex = (l, i) => l - (i + 1);
    
    var j = '\n' + input.join('\n') + '\n',
        i = j.search(pattern);

    if (i == -1) return undefined;

    var end = j.indexOf('\n', i);
    var start = reversedIndex(j.length, reverse(j).indexOf('\n', reversedIndex(j.length, end) + 1));
    return j.substr(start + 1, end - start - 1);
}

console.log(f(['a', 'at', 'cat', 'scat', 'catch'], '.+at'));
console.log(f(['a', 'at', 'cat', 'scat', 'catch'], '.+cat'));
console.log(f(['a', 'at', 'cat', 'scat', 'catch'], '.t'));
console.log(f(['a', 'at', 'cat', 'scat', 'catch'], 'a'));
console.log(f(['a', 'at', 'cat', 'scat', 'catch'], '.+xat'));

This shows that Ryan paid attention to the requirements. I specifically said the answer might be late in the array, and he's catering for that here.

Plus tests! Including unhappy path tests!

Ryan knows how to get bonus points from me, eh?

Choop

Choop's used CFML. It's interesting how most of my readership is still made up of CFMLers, but most of them here have chosen not to use it. Well: a bit interesting.

/* lucee or adobe CF 11+  */
mystrings = ['a', 'at', 'cat', 'scat', 'catch'];
mytest = '.+at';
function firstMatch( strings, regex ) {
    var test = arguments.regex;
    var passing = arguments.strings.filter( function ( check ) {
        var matches = REMatch( test, check );
        return matches.Len() > 0;
    });
    return passing[ 1 ];
}
WriteOutput( firstMatch( mystrings, mytest ) );

Choop's eschewed terseness in favour of writing good clear code.

Mingo

Next up is Mingo's answer. I clearly spoke too soon before as he's stuck with CFML too. He's taken the reduce route, similar to my approach:

  arrayOfStrings = [ 'a', 'at', 'cat', 'scat', 'catch' ];
  regex = '.+at';
  writeOutput( arrayOfStrings.reduce( function( result='', item ){ return len( result ) ? result : item.reFind( regex ) ? item : ''; } ) );

Adam Tuttle made a knowing comment against this one:

I thought about nested ternaries too but figured Adam would chastise me for it. ;)
And yer bloody right! ;-)

I'm all good for a single ternary expression, but as soon as there's more than one... I think everyone's getting confused pretty quickly?

Adam

And indeed Adam's own JavaScript entry is up next. Props for using ES2015 constructs. I especially like how it can now do interpolated strings. Like CFML has been doing for about 20yrs.

"use strict"

function firstMatch ( pattern, data ) {
  return data.reduce( ( prev, curr ) => {
      if ( prev.length > 0 ) { return prev }
      if ( pattern.test( curr ) ) { return curr }
      return prev
  }, '')
}

const pattern = /.+at/
let data = ['a', 'at', 'cat', 'scat', 'catch']

console.log( `the answer is: ${firstMatch( pattern, data )}` )

His logic is very similar to my own reduce example too. (Oh, I needed to add "use strict" to that code to get it to run on my node install. Not sure why... it might be old).

As I am writing this up, I have noticed two entries stuck in my moderation queue. Sorry about that.

Eric

Here's Eric's CFML answer:

    function first( arr, predicate, defaultValue ) {
        if ( arrayIsEmpty( arr ) ) {
            if ( ! isNull( defaultValue ) ) {
                return defaultValue;
            }

            throw( "Cannot return the result because the array is either empty or no value matched the predicate with no default value provided." );
        }

        if ( isNull( predicate ) ) {
            return arr[ 1 ];
        } else {
            arguments.arr = arr.filter( predicate );
            structDelete( arguments, "predicate" );
            return first( argumentCollection = arguments );
        }
    }
    
    answer = first( ['a', 'at', 'cat', 'scat', 'catch'], function( str ) {
        return reFind( '.+at', str );
    } );
    
    writeOutput( answer );

Good to see some validation going on in there. I also really like how his own function takes a callback to do the check for what constitutes a match. Bloody nice that.

Quan Tran

Coincidentally the most interesting answer is the last one (it would not have been last had it not got stuck in the moderation queue, that said). Here's Quan Tran's recursive CFML answer:

// http://blog.adamcameron.me/2016/08/code-quiz.html
function regexSearchArray(regexString,arrStrings){
    var localArrStrings = arrStrings;
    
    if (not arrayLen(localArrStrings))
        return;
    else if (refind(regexString, localArrStrings[1]))
        return localArrStrings[1];
    else{
        ArrayDeleteAt(localArrStrings,1);
        return regexSearchArray(regexString,localArrStrings);
    }
}
arrStrings = ['a', 'at', 'cat', 'scat', 'catch'];
writeoutput(regexSearchArray('.+at',arrStrings));

That is a very inventive way of getting around my "no loop statements" rule. Very impressed.



That's quite a few entries for a wee code puzzle on this blog (the readership of which has pretty much died since I moved from CFML, even allowing for the distinct lack of content recently). So thanks for that.

I think all these answers had merit and discussion points and had something interesting about them. I like how terse Jesse and others managed to get their answers. I liked how Isaiah used Ruby instead of the usual suspects (for this blog, anyhow... not much Ruby going on around here). I especially like how Ryan provided more tests.

But the winner is Quan Tran with his recursive solution. It might not be the most performant, but it's def the most interesting.

Cheers all. I have a few other dumb-arse quiz questions I might continue to ask. We'll see.

Righto.

--
Adam