Friday 8 March 2013

Sean prompts me to look at ColdFusion threading some more

G'day:
I got a bit of a slap-down from Sean y/day regarding that code I posted using <cfthread> (well: thread, but samesame). I've yet to clarify whether he was referring the code I was triaging from CFLib, or my scratch code demonstrating a decoupled approach to same, but it's encouraged me to look into things a bit more thoroughly anyhow. I appreciate the slap-down, because it identified a gap in my knowledge, giving me the opportunity to fill that gap.

So, anyway, Sean made a coupla interesting observations which I'll look at here.


First up was a reminder that arrays are copied by value whenever one makes an assignment in ColdFusion. Here's a simple example:

a1 = ["tahi", "rua"];
a2 = a1;
arrayAppend(a1, "toru");
arrayAppend(a2, "wha");

writeDump(variables);

In ColdFusion, the highlighted line makes an entirely new array for a2, so a1 and a2 are two different data structures, meaning the output is:

struct
A1
array
1tahi
2rua
3toru
A2
array
1tahi
2rua
3wha

Just to clarify / re-cap, in general in CFML, complex data structures are generally copied by reference, so using a similar example using structs we see differing results here:

st1 = {one="tahi", two="rua"};
st2 = st1;
st1.three = "toru";
st2.four = "wha";

writeDump(variables);

struct
ST1
struct
FOURwha
ONEtahi
THREEtoru
TWOrua
ST2
struct
FOURwha
ONEtahi
THREEtoru
TWOrua

Notice how both st1 and st2 are referring to the same data structure in memory, so adding a key to one also adds it to the "other" one, as they're actually both the same bits of data.

The quirk of ColdFusion's CFML is that (for some reason best known to someone other than me) arrays - despite being complex data types - are copied by value, not by reference. I cover this stuff in greater depth in an earlier article, if you want to read it.

Railo on the other hand acts more sensibly, and arrays are treated the same as all other complex data types, so the output from that first code snippet is:

Scope
A1
Array
1
stringtahi
2
stringrua
3
stringtoru
4
stringwha
A2
Array
1
stringtahi
2
stringrua
3
stringtoru
4
stringwha

So a1 and a2 are both referencing the same data structure in memory. Props to Railo for doing this sensibly.

Secondly Sean pointed out that arguments / attributes passed into a thread are all copied irrespective of what data type they are. I did not know that, just assuming the same rules as normal apply. So that's good to know.

He also observed that the reasoning behind the code I was triaging for CFLib doing all this threading shenanigans in case the arrays being processed very so large as to cause a performance hit, and that being the case, passing the big arrays in and out of threads is going to chew through a lot of memory, making the whole process a bit risky. Good point. Whilst I baulked at the idea of having the threading embedded in the function, it was from the perspective of "it's the wrong place for that code", not "and in the situation it's trying to mitigate, it might be making things worse". Or at least shift the risk to another resource.

So cheers for that, Sean. I've got some feedback to give on your comment, but I'll do that separately.



After digesting what Sean had said, I knocked together some code to demonstrate to myself (with a view to writing it up here) what's going on with these threads. And indeed to verify what Sean was saying. I don't just assume people are right when they say things.

First up, I just had a pared down version of the code from y/day:

a = [];

request.sequence = 0;    // used to demonstrate which order the threads did the work

for (i=1; i <= 3; i++){
    thread name=i action="run" a=a i=i {
        sleep(randRange(10,20));
        arrayAppend(a, "Appended by thread #i# @ sequence #++request.sequence#");
        thread.a = a;
    }
}

thread action="join" name="1,2,3";

writeDump(var=cfthread, label="cfthread");
writeDump(var=variables.a, label="variables.a");

This passes an array into each of three threads, and those threads append an element to the array. I've got a delay in each thread so they will do this append in a different order from the order they're kicked off (just to make it clearer what's happening when). We wait for the thread to finish, and then look at what we've got afterwards. The desired result here is to get an array back with three elements in it, one added by each thread. However according to CF's rules of array copying and what Sean said, this actually won't do what we want:

cfthread - struct
1
cfthread - struct
A
cfthread - array
1Appended by thread 1 @ sequence 2
ELAPSEDTIME20
NAME1
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:07:52'}
STATUSCOMPLETED
2
cfthread - struct
A
cfthread - array
1Appended by thread 2 @ sequence 2
ELAPSEDTIME18
NAME2
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:07:52'}
STATUSCOMPLETED
3
cfthread - struct
A
cfthread - array
1Appended by thread 3 @ sequence 1
ELAPSEDTIME10
NAME3
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:07:52'}
STATUSCOMPLETED
variables.a - array [empty]

And indeed it doesn't. Each array passed to the threads is a new array, so each new element goes into that copy of the array, not the original one.

On Railo, the same code works absolutely fine and as we intend it to:

cfthread
Struct
1
Struct
ELAPSEDTIME
number20
NAME
string1
OUTPUT
string
PRIORITY
stringNORMAL
STARTTIME
Date Time (Europe/London)
{ts '2013-03-08 13:12:49'}
STATUS
stringCOMPLETED
STACKTRACE
string
A
Array
1
stringAppended by thread 3 @ sequence 1
2
stringAppended by thread 2 @ sequence 2
3
stringAppended by thread 1 @ sequence 3
2
Struct
ELAPSEDTIME
number20
NAME
string2
OUTPUT
string
PRIORITY
stringNORMAL
STARTTIME
Date Time (Europe/London)
{ts '2013-03-08 13:12:49'}
STATUS
stringCOMPLETED
STACKTRACE
string
A
Array
1
stringAppended by thread 3 @ sequence 1
2
stringAppended by thread 2 @ sequence 2
3
stringAppended by thread 1 @ sequence 3
3
Struct
ELAPSEDTIME
number19
NAME
string3
OUTPUT
string
PRIORITY
stringNORMAL
STARTTIME
Date Time (Europe/London)
{ts '2013-03-08 13:12:49'}
STATUS
stringCOMPLETED
STACKTRACE
string
A
Array
1
stringAppended by thread 3 @ sequence 1
2
stringAppended by thread 2 @ sequence 2
3
stringAppended by thread 1 @ sequence 3
variables.a
Array
1
stringAppended by thread 3 @ sequence 1
2
stringAppended by thread 2 @ sequence 2
3
stringAppended by thread 1 @ sequence 3

But this is predictable. CF copies arrays by value, and Railo does it by reference. So this does not at all test what Sean said about values being passed into threads get copied, irrespective of data type. For this I modified the code to use a struct instead:

data = {a=[]};

request.sequence = 0;    // used to demonstrate which order the threads did the work

for (i=1; i <= 3; i++){
    thread name=i action="run" data=data i=i {
        sleep(randRange(10,20));
        arrayAppend(data.a, "Appended by thread #i#");
        data["sequence_#++request.sequence#"] = "Created by thread #i#";
        thread.data = data;
    }
}

thread action="join" name="1,2,3";

writeDump(var=cfthread, label="cfthread");
writeDump(var=variables.data, label="variables.data");

Now according to CF's data copying rules, structs are copied by reference, so all things being equal this should do what we want. However according to what Sean has warned me about with values being passed into threads, it will still not work:

cfthread - struct
1
cfthread - struct
DATA
cfthread - struct
A
cfthread - array
1Appended by thread 1
sequence_2Created by thread 1
ELAPSEDTIME20
NAME1
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:15:56'}
STATUSCOMPLETED
2
cfthread - struct
DATA
cfthread - struct
A
cfthread - array
1Appended by thread 2
sequence_3Created by thread 2
ELAPSEDTIME18
NAME2
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:15:56'}
STATUSCOMPLETED
3
cfthread - struct
DATA
cfthread - struct
A
cfthread - array
1Appended by thread 3
sequence_1Created by thread 3
ELAPSEDTIME11
NAME3
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:15:56'}
STATUSCOMPLETED
variables.data - struct
A
variables.data - array [empty]

And it didn't. Even though we were passing in a struct, CF copied the whole thing, so each thread got a different struct, and the main struct was never touched. Again, Railo does exactly what we intended here:

cfthread
Struct
1
Struct
ELAPSEDTIME
number19
NAME
string1
OUTPUT
string
PRIORITY
stringNORMAL
STARTTIME
Date Time (Europe/London)
{ts '2013-03-08 13:16:49'}
STATUS
stringCOMPLETED
STACKTRACE
string
DATA
Struct
a
Array
1
stringAppended by thread 2
2
stringAppended by thread 3
3
stringAppended by thread 1
sequence_1
stringCreated by thread 2
sequence_2
stringCreated by thread 3
sequence_3
stringCreated by thread 1
2
Struct
ELAPSEDTIME
number19
NAME
string2
OUTPUT
string
PRIORITY
stringNORMAL
STARTTIME
Date Time (Europe/London)
{ts '2013-03-08 13:16:49'}
STATUS
stringCOMPLETED
STACKTRACE
string
DATA
Struct
a
Array
1
stringAppended by thread 2
2
stringAppended by thread 3
3
stringAppended by thread 1
sequence_1
stringCreated by thread 2
sequence_2
stringCreated by thread 3
sequence_3
stringCreated by thread 1
3
Struct
ELAPSEDTIME
number18
NAME
string3
OUTPUT
string
PRIORITY
stringNORMAL
STARTTIME
Date Time (Europe/London)
{ts '2013-03-08 13:16:49'}
STATUS
stringCOMPLETED
STACKTRACE
string
DATA
Struct
a
Array
1
stringAppended by thread 2
2
stringAppended by thread 3
3
stringAppended by thread 1
sequence_1
stringCreated by thread 2
sequence_2
stringCreated by thread 3
sequence_3
stringCreated by thread 1
variables.data
Struct
a
Array
1
stringAppended by thread 2
2
stringAppended by thread 3
3
stringAppended by thread 1
sequence_1
stringCreated by thread 2
sequence_2
stringCreated by thread 3
sequence_3
stringCreated by thread 1

Good ole Railo.

What a pain in the arse ColdFusion is sometimes. We can, however, circumvent this nonsense by not passing the array into the thread, instead just availing it to the threads via the CFTHREAD scope:

cfthread.a=[];

request.sequence = 0;    // used to demonstrate which order the threads did the work

for (i=1; i <= 3; i++){
    thread name=i action="run" i=i {
        sleep(randRange(10,20));
        arrayAppend(cfthread.a, "Appended by thread #i# @ sequence #++request.sequence#");
        thread.data = cfthread.a;
    }
}

thread action="join" name="1,2,3";

writeDump(var=cfthread, label="cfthread");

This does the trick:

cfthread - struct
1
cfthread - struct
DATA
cfthread - array
1Appended by thread 3 @ sequence 1
2Appended by thread 2 @ sequence 2
3Appended by thread 1 @ sequence 3
ELAPSEDTIME23
NAME1
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:19:13'}
STATUSCOMPLETED
2
cfthread - struct
DATA
cfthread - array
1Appended by thread 3 @ sequence 1
2Appended by thread 2 @ sequence 2
ELAPSEDTIME18
NAME2
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:19:13'}
STATUSCOMPLETED
3
cfthread - struct
DATA
cfthread - array
1Appended by thread 3 @ sequence 1
ELAPSEDTIME17
NAME3
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:19:13'}
STATUSCOMPLETED
A
cfthread - array
1Appended by thread 3 @ sequence 1
2Appended by thread 2 @ sequence 2
3Appended by thread 1 @ sequence 3

One interesting thing to note here is that the sub structs keyed by thread name (the 1, 2, 3 ones) seem to contain a COPY of the state of the thread scope at the end of the thread. Note how each of them have the current state of the array when the given thread was run, not the final state of the array (at the point at which the dump was done). I guess this fits with the notion CF has of copying everything all over the show.

Another interesting thing I spotted was this (from a different run of the same code):

cfthread - struct
1
cfthread - struct
DATA
cfthread - array
1Appended by thread 3 @ sequence 1
2Appended by thread 2 @ sequence 1
3Appended by thread 1 @ sequence 2
ELAPSEDTIME16
NAME1
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:20:55'}
STATUSCOMPLETED
2
cfthread - struct
DATA
cfthread - array
1Appended by thread 3 @ sequence 1
2Appended by thread 2 @ sequence 1
ELAPSEDTIME14
NAME2
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:20:55'}
STATUSCOMPLETED
3
cfthread - struct
DATA
cfthread - array
1Appended by thread 3 @ sequence 1
ELAPSEDTIME13
NAME3
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:20:55'}
STATUSCOMPLETED
A
cfthread - array
1Appended by thread 3 @ sequence 1
2Appended by thread 2 @ sequence 1
3Appended by thread 1 @ sequence 2

This is worrying. To me this says that ++request.sequence is not an atomic operation. I cannot see how from the code being run that request.sequence could ever have the same value in two threads, by the time I'm outputting the value. This happens about 5% of the time.

Railo's output is curious here too. I just get:

cfthread
Struct
A
Array
1
stringAppended by thread 3 @ sequence 1
2
stringAppended by thread 1 @ sequence 2
3
stringAppended by thread 2 @ sequence 3

Where's the data from each thread?

Also note that I can replicate that issue with request.sequence on Railo too:

cfthread
Struct
A
Array
1
stringAppended by thread 3 @ sequence 1
2
stringAppended by thread 1 @ sequence 1
3
stringAppended by thread 2 @ sequence 2

Finally, here's the equivalent code using a struct instead of an array:

cfthread.data = {a=[]};

request.sequence = 0;    // used to demonstrate which order the threads did the work

for (i=1; i <= 3; i++){
    thread name=i action="run" i=i {
        sleep(randRange(10,20));
        arrayAppend(cfthread.data.a, "Appended by thread #i#");
        cfthread.data["sequence_#++request.sequence#"] = "Created by thread #i#";
        thread.data = cfthread.data;
    }
}

thread action="join" name="1,2,3";

writeDump(var=cfthread, label="cfthread");

This has puzzling results too, when compared to the array version:

cfthread - struct
1
cfthread - struct
DATA
cfthread - struct
A
cfthread - array
1Appended by thread 1
2Appended by thread 3
3Appended by thread 2
sequence_1Created by thread 1
sequence_2Created by thread 3
sequence_3Created by thread 2
ELAPSEDTIME14
NAME1
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:29:40'}
STATUSCOMPLETED
2
cfthread - struct
DATA
cfthread - struct
A
cfthread - array
1Appended by thread 1
2Appended by thread 3
3Appended by thread 2
sequence_1Created by thread 1
sequence_2Created by thread 3
sequence_3Created by thread 2
ELAPSEDTIME22
NAME2
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:29:40'}
STATUSCOMPLETED
3
cfthread - struct
DATA
cfthread - struct
A
cfthread - array
1Appended by thread 1
2Appended by thread 3
3Appended by thread 2
sequence_1Created by thread 1
sequence_2Created by thread 3
sequence_3Created by thread 2
ELAPSEDTIME13
NAME3
OUTPUT[empty string]
PRIORITYNORMAL
STARTTIME{ts '2013-03-08 13:29:40'}
STATUSCOMPLETED
DATA
cfthread - struct
A
cfthread - array
1Appended by thread 1
2Appended by thread 3
3Appended by thread 2
sequence_1Created by thread 1
sequence_2Created by thread 3
sequence_3Created by thread 2

You know how before, in the thread-name-specific substructs we could see the array being built up index by index? With a struct, we're just seeing the final result. So it seems in this case those substructs are all referencing the same struct in memory (so we just see the final result once we dump the thing out at the end of the process).

This version can also fall foul of the same-valued request.sequence issue:

cfthread - struct
A
cfthread - array
1Appended by thread 1
2Appended by thread 2
3Appended by thread 3
sequence_1Created by thread 1
sequence_2Created by thread 3

(I just dumped cfthread.data this time)

Note how sequence_2 has been written twice here, once for thread thread 2, and then overwritten by thread 3 (but the sequence is still 2!).

Railo works exactly the same as CF does here.

So I think I've got straight in my head now how passing data to threads works in CF (and Railo), and I think that sequencing thing is a bug? But I'm less sure about that, and if anyone knows what I'm missing, gimme another slapdown!

Update / onanism
Anyone wanting to slap me down will have to wait, as I've done it to myself! The ++ operators (and their ilk) are known to be non-thread safe. I'll write another quick article instead of updating this one with the details.


Cheers.

--
Adam