Friday, 19 April 2013

Question: when to lock scopes

G'day:
I was chatting with some other CFML developers over the last few days about when one should / should not use <cflock> to lock scopes.  My position is "hardly ever", and "certainly not as often as people tend to, and not for the reasons they think they have to".

What follows is my understanding of the situation, but it occurs to me that my understanding comes from anecdote, blog articles and inference. Nothing concrete. That said, I am 95% confident that I understand things correctly, but I'm suddenly annoyed about possibly missing that last 5%. So this article is to articulate my understanding, and solicit input from other people if I've missed anything, or got anything wrong.


I will also invite engineers from both ColdFusion and Railo to comment. Although all I can do is "invite", I can't guarantee anything. I'll promise a beer or two to the first one who gives a comprehensive answer though! ;-)

Back in the pre-CF-runs-on-Java days - CF5 and before - the shared scopes in ColdFusion had to be treated with caution, and any usage of them at all had to be thoroughly locked. One had to have code like this:

<cflock scope="session" type="read" timeout="2" throwontimeout="true">
    <cfif not isDefined("session.foo")>
        <cflock scope="session" type="exclusive" timeout="2" throwontimeout="true">
            <cfset session.foo = "bar">
        </cflock>
    </cfif>
</cflock>

I've highlighted why we need to lock for both reading and writing there.

That's an egregious example, it's the sort of thing that was necessary. It could be summarised as:

<cflock scope="session" type="exclusive" timeout="2" throwontimeout="true">
    <cfset session.foo = "bar">
    <cfparam name="session.foo" default="bar">
</cflock>

But that always has an exclusive lock whether it ends up being needed or not, so is less than ideal.

Even if just reading a shared-scope variable, one had to lock it:

<cflock scope="session" type="read" timeout="2" throwontimeout="true">
    <cfset request.localCopyOfSession = duplicate(session)>
</cflock>

Note: a lot of people would have just done this:

<cflock scope="session" type="read" timeout="2" throwontimeout="true">
    <cfset request.localCopyOfSession = session>
</cflock>

Which is no good as that just makes a reference to session, so it does not mitigate any shared-scope usage concerns.

Why did one need to do this? Because the implementation of the shared scopes in ColdFusion up to and including CF5 were not thread safe. This meant that multiple requests could be trying to update the value of a given session variable at the same time, and basically mess it up. Or they could - apparently - even have problems accessing session variables adjacent to each other in memory, and due to how the session scope's memory was managed, they could collide with each other.

Basically the shared scopes were very unstable and a bit of a headache (and a performance bottle-neck) to deal with.

This was all sorted out when ColdFusion was ported to Java: internally, the shared scopes are synchronised so that all the locking shenanigans is done @ Java level, so there's no need to do them at CFML level. Cool.

This is not the full story though. Protecting shared-scope variables for basic set/get operations is not the full story with locking requirements. What we've looked at so far here is just working around the basic mechanics of how shared scopes need to be implemented to even work properly. There's still the consideration of situations our CFML code can create. Race conditions.

A race condition is a situation when there is a sequence of operations which - if not treated atomically - can lead to unexpected results. Here's a pseudo-code example:
  1. The current value of session.count is 1
  2. Take the current value of session.count as variables.count
  3. Add one to variables.count
  4. Set session.count to be the updated value of variables.count
There's a potential race condition in that.

Consider the normal sequence of events, and each request passes through that process in turn:

REQ1/STEP1: // session.count is currently 1
REQ1/STEP2: variables.count = session.count (variables.count is 1)
REQ1/STEP3: increment variables.count (variables.count is 2)
REQ1/STEP4: session.count = variables.count (session.count is 2)
REQ2/STEP1: // session.count is currently 2
REQ2/STEP2: variables.count = session.count (variables.count is 2)
REQ2/STEP3: increment variables.count (variables.count is 3)
REQ2/STEP4: session.count = variables.count (session.count is 3)

This is what we want: each request increments the counter.

However consider two requests running that process pretty much simultaneously:

REQ1/STEP1: // session.count is currently 1
REQ2/STEP1: // session.count is currently 1
REQ1/STEP2: variables.count = session.count (variables.count is 1)
REQ2/STEP2: variables.count = session.count (variables.count is 1)
REQ1/STEP3: increment variables.count (variables.count is 2)
REQ2/STEP3: increment variables.count (variables.count is 2)
REQ1/STEP4: session.count = variables.count (session.count is 2)
REQ2/STEP4: session.count = variables.count (session.count is 2)

So we've had two requests, same as in the first example, but we haven't ended up counting one of them. That's a race condition.

That might seem like a very contrived situation, but it's actually a very real one. It's exactly what happens when one uses the ++ operator. I discuss this in an earlier article.

This is where locking comes back in. The solution here is to modify the process slightly:

  1. Create a lock
  2. The current value of session.count is 1
  3. Take the current value of session.count as variables.count
  4. Add one to variables.count
  5. Set session.count to be the updated value of variables.count
  6. Release the lock
Now the previous example works like this:

REQ1/STEP0: create the lock
REQ2/STEP0: the code is locked. Wait
REQ1/STEP1: // session.count is currently 1
REQ1/STEP2: variables.count = session.count (variables.count is 1)
REQ1/STEP3: increment variables.count (variables.count is 2)
REQ1/STEP4: session.count = variables.count (session.count is 2)
REQ1/STEP5: release the lock
REQ2/STEP0: the code is unlocked so proceed... lock it again
REQ2/STEP1: // session.count is currently 2REQ2/STEP2: variables.count = session.count (variables.count is 2)
REQ2/STEP3: increment variables.count (variables.count is 3)
REQ2/STEP4: session.count = variables.count (session.count is 3)
REQ2/STEP5: release the lock

Perfect. If you are using ++ on a session variable, you need to lock it:

<cflock scope="session" type="exclusive" timeout="2" throwontimeout="true">
    <cfset session.counter++>
</cflock>


So - bottom line - the only time one needs a lock these days is when there's one of these race conditions possible. Which is basically - I suppose - when the write to a shared scope variable is based on an earlier read of a variable (the same one, or a different one) in a scope that can be accessed by more than one thread.

I use the term "in a scope that can be accessed by more than one thread" rather than "shared scope", because there are situations in which variables in scopes that are not traditionally considered "shared-scope" can be accessed by separate threads simultaneously.

And I'd also observe that a scope-lock is only really appropriate if all the components in the race condition are in the same scope, which is not necessarily going to be the case. I'd err towards using a named lock for these situations, irrespective of which scope is involved. And definitely it more than one shared scope is involved (well: you'd need to then, as any shared scope could contribute to the race condition).

How does that sound compared to everyone else's understanding of locking, both in the context of shared-scopes, and in general?

--
Adam