Wednesday 22 March 2017

Code puzzle: find any match from an array of regex patterns

G'day:
I'm being slightly cheeky as I'm hijacking someone else's question / puzzle from the CFML Slack channel that we were discussing last night. Don't worry, that'll be the last mention of CFML in this article, and indeed the question itself was asking for a JS solution. But it's a generic code puzzle.

Let's say one has a string, and an array of regex patterns one wants to check for in said string. A number of solutions were offered, using a number of techniques. And it even branched away from JS into Clojure (thanks Sean).

How would you go about doing it?

The input data is:

patterns = ["Savvy", "Smart", "Sweet"]
testForPositive = "Super Savvy"
testForNegative = "Super Svelt"

And in these cases we want true and false to be returned, respectively.

Note that the patterns there need to be considered regular expressions, not simple strings. The example data is just using simple strings for the sake of clarity.

Initially I was gonna do something like this:

function matchesOne(str, patterns) {
    var onePattern = new RegExp(patterns.join("|"));
    return str.search(onePattern) > -1;
}

Here I'm joining all the patterns into one big pattern, and searching for that. I was slightly hesitant about this as it's loading the complexity into the regex operation, which I felt perhaps wasn't ideal. Also it's not very clever: if any of the individual patterns had a | in them: this'd break. Scratch that.

Ryan Guill seeded the idea of approaching it from the other direction. Using the array as an array, and using one of JS's array higher-order functions to do the trick. We saw examples using find and filter, but I reckoned some was the best option here:

function matchesOne(str, patterns) {
    return patterns.some(pattern => str.search(pattern) > -1);
}


I'm happy with that.

The ante was raised next: write a function which returned the position of the first match (not just a boolean as to whether there was a match or not).

To me this was a reduction operation: take an array and - given some algorithm applied to each element - return a final single value.

I tried, but I could not make it very clean. This is what I got:

function firstMatch(str, patterns) {
    return patterns.reduce(function(first, pattern){
        var thisMatch = str.search(pattern);
        return first == -1 ? thisMatch : thisMatch == -1 ? first : Math.min(first, thisMatch);
    }, -1);
}


I'm happy with the reduce approach, but I'm really not happy with the expression to identify the lowest match (or continue to return -1 if no match is found):

first == -1 ? thisMatch : thisMatch == -1 ? first : Math.min(first, thisMatch)


To be clear, if we expanded that out, the logic is:

if (first == -1) {
    return thisMatch;
}
if (thisMatch == -1) {
    return first;
}
return Math.min(first, thisMatch);

That's actually "cleaner", but I still think I'm missing a trick.

So the puzzle to solve here is: what's a clever (but also Clean - in the RC Martin sense) way returning the lowest non -1 value of first and thisMatch, or fall back to -1 if all else fails.

Using this code (or some approximation thereof):

function firstMatch(str, patterns) {
    return patterns.reduce(function(first, pattern){
        var thisMatch = str.search(pattern);
        // return result of your code here
    }, -1);
}

console.log(firstMatch("Super Savvy", ["Savvy", "Smart", "Sweet"]));
console.log(firstMatch("Super Svelt", ["Savvy", "Smart", "Sweet"]));


One should get 6 and -1 for the respective "tests".

Use any language you like.

Oh and hey... if you have any interesting solutions for the two functions themselves, feel free to post 'em too.


Righto.

--
Adam