Tuesday, 16 February 2016

ColdFusion 2016: trying to write about one thing; end up writing about another

G'day:
I was gonna do an article about ColdFusion 2016's new query iteration functions, but my fondness for using Maori as my sample data language has pointed me in the direction of a possible bug in ColdFusion 2016's CLI.

I've distilled it down to this:

CLI.writeln("kuputuhi tauira ki pūāhua nako");

If I run that via the ColdFusion 2016 CLI, I get this:


D:\src\CF12\cli\utf8>cf
CLI.writeln("kuputuhi tauira ki pūāhua nako");
^Z
kuputuhi tauira ki puahua nako
D:\src\CF12\cli\utf8>

Hmmm... where gone the diacritic marks?

Initially I put this down to a tweak I've made to the CLI batch file which allows me to put code straight into STDIN instead of a file and run it, so I ran the code old-school:

D:\src\CF12\cli\utf8>cf cfmlExample.cfm
kuputuhi tauira ki p┼½─?hua nako
D:\src\CF12\cli\utf8>

Yikes! Worse!

I figured there was a chance that Windows wasn't handling encoding in its own CLI box that well, so decided to see what PHP made of this:

<?php
echo "kuputuhi tauira ki pūāhua nako";

And this yields:

D:\src\CF12\cli\utf8>php phpExample.php
kuputuhi tauira ki pūāhua nako
D:\src\CF12\cli\utf8>

OK, so ColdFusion is doing no different from PHP here when running a file. On a whim I decided to try Ruby too, and this fared better:

puts "kuputuhi tauira ki pūāhua nako"


D:\src\CF12\cli\utf8>ruby rubyExample.rb
kuputuhi tauira ki pūāhua nako

D:\src\CF12\cli\utf8>


So it wasn't like it was impossible to display the right characters in a Windows CLI box, but I decided to google anyhow.

I found a PHP answer which said I needed to change the code page in the CLI box to support UTF-8, by doing this:


D:\src\CF12\cli\utf8>chcp 65001
Active code page: 65001

D:\src\CF12\cli\utf8>


This is a new one on me, but it seemed to sort PHP's issues out:


D:\src\CF12\cli\utf8>php phpExample.php
kuputuhi tauira ki pūāhua nako
D:\src\CF12\cli\utf8>


But it didn't really help ColdFusion:


D:\src\CF12\cli\utf8>cf cfmlExample.cfm
kuputuhi tauira ki pū�?hua nako
D:\src\CF12\cli\utf8>

On a further whim, I decided to extend my test bed to Python:

print("kuputuhi tauira ki pūāhua nako")


Without the chcp call, I got this:

D:\src\CF12\cli\utf8>py pythonExample.py
kuputuhi tauira ki pūāhua nako

D:\src\CF12\cli\utf8>


Which is exactly the same as ColdFusion's output. But once I make the chcp call:


D:\src\CF12\cli\utf8>chcp 65001
Active code page: 65001

D:\src\CF12\cli\utf8>py pythonExample.py
kuputuhi tauira ki pūāhua nako

D:\src\CF12\cli\utf8>

As you can see: it's fine.

So my conclusion here is that it's completely legit than ColdFusion might not be able to render UTF-8-encoded characters without the code page being actively changed, but even after that something is not right with the CLI. I supsect that when they load the file in, they don't give any thought to encoding at all, so the source code actually gets "corrupted" before it's run. I have to admit that my understanding of encoding is OK, but not fantastic, but when one throws Windows CLI code pages into the mix too... I dunno what my expectations ought to be. But it ain't working, that's fer sure.

Could someone please try this on *nix and see what they get when using a more robust shell?

Now it's too late for me to look at query iterations functions, so that might be a job for before work tomorrow.

Righto.

--
Adam