Thursday 8 September 2022

CFML: looking at how CFWheels messes up a loop


This exercise came about from a bug in CFWheels we encountered today.

We got a notification that a user had received a 500-error, and I was looking into it. It was happening in /wheels/global/internal.cfm, in this bit of code:

for (local.key in application.wheels.cache[arguments.category]) {
    if (Now() > application.wheels.cache[arguments.category][local.key].expiresAt) {

On the second line we are getting an error: key [81AB90EB428410D6222B18AAE12EC014] doesn't exist.

It should be obvious to even a journeyman dev what the problem is here.

When the first line of code runs, there is a key 81AB90EB428410D6222B18AAE12EC014 in application.wheels.cache. By the time the second line of code runs, some other request has come along and removed 81AB90EB428410D6222B18AAE12EC014 from application.wheels.cache, and hence we get the error. This is why everyone (?) knows that one must synchronise access to shared scopes like the application scope, so that if a process is writing to it, then other processes can't read from it. Or competing writes can't be happening simultaneously. IE: Chuck a lock block around this sort of thing. This is CFML 101 sort of stuff, CFWheels Team.

On the face of it, this was just super bad timing. What are the odds that one request will run the first line, and then before that request gets to the second line, another request has blitzed what the first line "knew" was there to use.

But it's not really super bad timing. It's just bad coding. This code is a loop. When does a for (key in struct) loop decide which keys exist in the struct, and accordingly which keys to loop over? At the first time the loop is entered. And only then. So if there are 10000 items in that struct when our request first ran, any other process has from the first iteration until the 9999th iteration to delete the 10000th key, and we'll get this error. That's a lot of opportunity for failure. Also, due to the very nature of this code, there's a good chance that a bunch of requests are gonna be running it simultaneously: the key deletions come after a determination that a shared cache is full and needs culling (let's not get into matter of why a function called addToCache is deleting stuff anyhow. This is not the worst thing about this cluster🦆, I assure you), and at the moment in time the cache seems to be full, then every call to addToCache (sic) will try to delete stuff. Ugh.

To demonstrate what's going on here, I've knocked a quick example together (code on github):

// Application.cfc
component { = "testLoopingOverAppScopedStructs"
// firstRequest.cfm
application.numbers = [
    "one" = "tahi",
    "two" = "rua",
    "three" = "toru",
    "four" = "wha"

for (en in application.numbers) {
    writeOutput("keys @ top of loop: #application.numbers.keyList()#<br>")
    writeOutput("current key: #en#<br>")
    writeOutput("current value: #application.numbers[en]#<br>")
    writeOutput("keys after sleep: #application.numbers.keyList()#<br>")
// secondRequest.cfm
application.numbers["five"]  = "rima"

All very straight forward. In secondRequest.cfm I only add a key so there's a visual cue in the output of firstRequest.cfm when the second request ran.

The test is to run firstRequest.cfm, then whist that's running, run secondRequest.cfm. The output from a typical run of firstRequest.cfm would be like:

keys @ top of loop: one,two,three,four
current key: one
current value: tahi
keys after sleep: one,two,three,four
keys @ top of loop: one,two,three,four
current key: two
current value: rua
keys after sleep: one,two,three,five <--- here is where I ran secondRequest.cfm
keys @ top of loop: one,two,three,five
current key: three
current value: toru
keys after sleep: one,two,three,five
keys @ top of loop: one,two,three,five
current key: four

Error Occurred While Processing Request
Element FOUR is undefined in a CFML structure referenced as part of an expression.

You can see that the keys the loop iterates over is set at the beginning, given it's unaffected by changes to the struct's keys during the loop.

What should the CFWheels codebase do to mitigate this. Well: here's a list of things that are… erm… "less good than they could be "… in that code.

  • Framework functionality should be in classes, not in include files. In this case cache-handing code should be in a CachingService or somesuch.
  • internal.cfm is an awful name for a library. A filename should describe what the code does (like, say "CachingService"), not how it's used ("internally", apparently. Sounds uncomfortable, not to mention a bit unhygienic).
  • Related to this, that file is a hotchpotch mess of all sorts of shite that should be homed somewhere else. This describes most files in the CFWheels codebase, unfortunately.
  • The only code that touches the application scope should be in an ApplicationScopeProxy object. BTW this makes controlling the scope-locking way easier to manage. See also: SessionScopeProxy / ServerScopeProxy / etc
  • Within that proxy, access to any shared scope should be appropriately synchronised (with a lock). This should also be extended to mitigating race conditions like we have here.
  • If your function is called addToCache then stick to doing that. Code that removes stuff from the cache does not belong there.
  • All the code in that method is conditional. All of it. There is no "mainline" or happy path. Simplify the conditions and bail out if they're not met. Don't put the code you actually expect to run inside a conditional block.
  • Connected to the two preceding points: most of the code in that method is not actually anything to do with adding something to a cache.
  • For the love of god stop prefixing functions with $ to indicate they're private. Just make them private.

I don't think anything there is particularly controversial or complicated. And whilst it's beyond "CFML 101", I think it's within the low bar of "Writing Frameworks 101".

Anyway. There you got. A coupla hours in the life of someone who has to use CFWheels.