Friday 18 June 2021

CFML: emulating query-of-query group-by with higher-order functions

G'day:

A week or so ago this question came up on the CFML slack channel:

Does QoQ support month() and year() such as in the following?

queryExecute("
      SELECT MONTH(settlementdate) AS month, year(settlementdate) AS year,
      SUM(LongTermGainLoss) AS ltgl, SUM(ShortTermGainLoss) AS stgl
      FROM data
      group by MONTH(settlementdate), year(settlementdate) 
    ",{},{
      dbtype: 'query'
})

The answer to the question is: Lucee supports MONTH and YEAR SQL functions in their QoQ SQL implementation, but ColdFusion does not. Here's a quick repro:

dates = queryNew(
    "date",
    "date",
    [
        [createDate(2011, 3, 24)],
        [createDate(2016, 8, 17)]
    ]
)

dateParts = queryExecute(
    "SELECT MONTH(date) AS month, YEAR(date) AS year FROM dates",
    {},
    {dbtype="query"}
)

writeDump(dateParts)

This works as expected on Lucee, but on ColdFusion errors with "Encountered "MONTH. Incorrect Select List, Incorrect select column".

The question intrigued me, and I thought… I reckon one could do something interesting with higher order functions to do what John needed here.

Firstly I needed some test data, so I banged out a quick function to do so:

function getUngroupedRecords(required numeric rows) {
    createRows = number => arrayNew(1).resize(number) // doesn't work in Lucee https://luceeserver.atlassian.net/browse/LDEV-2417
    date = () => now().add("d", randRange(-365, 365))
    amount = () => randRange(1,10000) / 100

    fakedDbData = queryNew(
        "settlementDate,LongTermGainLoss,ShortTermGainLoss",
        "Date,Double,Double",
        createRows(rows).map(_ => [date(), amount(), amount()])
    )
    return fakedDbData
}
ungroupedRecords = getUngroupedRecords(20)

NB: Lucee requires parentheses around even a single parameter in an arrow function, which is shouldn't so this breaks where indicated. This is a ColdFusion-only exercise here, so I'm running with a ColdFusion-only solution as the code is a bit nicer than if I make it work with Lucee too.

I thought there must be some better way of creating an array of a specific length just natively in CFML but I couldn't think of one, hence the arrayNew(1).resize(number) hack. It doesn't matter what's in the array, I just need that number of elements to re-map with my data.

Right so I have some data that possibly resembles the sort of thing John had.

From that I reduce that query to a struct keyed on "#year#-#month#", and for each row of the query I accumulate the gain/loss figures into their appropriate struct key entry:

ungroupedRecords.reduce((grouped={}, row) => {
    y = row.settlementDate.year()
    m = row.settlementDate.month()
    key = "#y#-#m#"
    grouped[key] = grouped[key] ?: {stgl = 0, ltgl = 0}
    grouped[key].stgl = grouped[key].stgl + row.ShortTermGainLoss
    grouped[key].ltgl = grouped[key].ltgl + row.LongTermGainLoss

    return grouped
})

At this point we have a "grouped-by" struct, but we need a query. So we reduce the struct back the other way now, to a query with the rows John spec'ed out in his original question:

.reduce( // breaks in Lucee https://luceeserver.atlassian.net/browse/LDEV-2523
    (records, key, values) => records.addRow({
        month = key.listLast("-"),
        year = key.listFirst("-"),
        ltgl = values.ltgl,
        stgl = values.stgl
    }),
    queryNew("month,year,ltgl,stgl", "Integer,Integer,Double,Double")
)

The only complexity there is we need to expand-out the struct key to be individual values again, cos they year and month have their own columns in the query.

Oh and as mentioned Lucee breaks on this. It has a bug in that when an array entry is null, it skips it when performing reduce (or map, etc) operation. So the end result is that it considers the result of the first reduce to be null, so this second one breaks cos one cannot reduce null.

Finally for good measure I sort the results by date:

.sort((r1, r2) => {
    yearDiff = r1.year - r2.year
    if (yearDiff != 0) {
        return yearDiff
    }
    return r1.month - r2.month
})

Initially I tried to be a smart-arse and do all that in one expression:

.sort((r1, r2) => (yearDiff = r1.year - r2.year) ? yearDiff : r1.month - r2.month)

And whilst Lucee was OK with that, ColdFusion could not make sense of the inline assignment expression there. I was just doing that so I only wanted r1.year - r2.year evaluated once, but I needed it in two places.

All of that is just the one expression in the end, when taken together:

groupedRecords = ungroupedRecords.reduce((grouped={}, row) => {
    y = row.settlementDate.year()
    m = row.settlementDate.month()
    key = "#y#-#m#"
    grouped[key] = grouped[key] ?: {stgl = 0, ltgl = 0}
    grouped[key].stgl = grouped[key].stgl + row.ShortTermGainLoss
    grouped[key].ltgl = grouped[key].ltgl + row.LongTermGainLoss

    return grouped
}).reduce( // breaks in Lucee https://luceeserver.atlassian.net/browse/LDEV-2523
    (records, key, values) => records.addRow({
        month = key.listLast("-"),
        year = key.listFirst("-"),
        ltgl = values.ltgl,
        stgl = values.stgl
    }),
    queryNew("month,year,ltgl,stgl", "Integer,Integer,Double,Double")
).sort((r1, r2) => {
    yearDiff = r1.year - r2.year
    if (yearDiff != 0) {
        return yearDiff
    }
    return r1.month - r2.month
})

The final output is along these lines:

I have to admit I did not TDD this work as I just wrote it in trycf.com, so I can't use TestBox, and can only write scripts on that anyhow. I did benchcheck the results by hand, and they're correct as far as I can tell.

I'm not sure how useful it is to have gone through this experience, but it was kinda fun I guess.

Righto.

--
Adam