Showing posts with label Lucee/CFWheels/Docker series. Show all posts
Showing posts with label Lucee/CFWheels/Docker series. Show all posts

Saturday 17 April 2021

Using Docker to strum up an Nginx website serving CFML via Lucee

G'day

OK so this is not the blog article I expected to be writing, had you asked me two weeks ago. But here we are. I'll go into the reason why I'm doing this a bit later.

This will be a CFML-oriented version of the "VueJs/Symfony/Docker/TDD series", and has its own tag: "Lucee/CFWheels/Docker series":

  • Nginx website.
  • Proxying for Lucee as the CFML-processing application layer.
  • Running inside Docker containers.
  • TDD the whole enterprise.

If I have time (and any will-to-live remaining), I will add this lot into the mix:

  • Work out how Forgebox works, which seems to be CFML's equivalent of Composer / NPM
  • Use that to install Testbox (CFML-based Jasmine-ish testing framework)
  • And also install CFWheels, a CFML-based framework akin to Ruby on Rails.

I'll also be returning to SublimeText for the first time in seven-or-so years. Apparently it's still a reasonable text editor to use for CFML code.

For those few of you that have started paying attention to me more recently: CFML is not new to me. I spent over a decade as a CFML developer (2001-2013). I shifted to PHP because my erstwhile employer (HostelBookers, CFML shop), was bought by Hostelworld (PHP shop) back then. I've been doing PHP since. That said, I am very rusty with CFML, and - well, hopefully - they CFML landscape has moved on since then too. So whilst I'm not a newbie with CFML stuff, getting Lucee running in a container, Forgebox and CFWheels is entirely new to me.

I'm still gonna be using PHP to do the initial testing of things, because I won't have Testbox running for the first while. So I'll need a PHP container too. I'll refactor this out once I get Testbox in.

It needs a PHP container for running tests

There's nothing new here, and what I've done is largely irrelevant to this exercise, so I'll just list the files and link through to that current state of the files in source control:

adam@DESKTOP-QV1A45U:/mnt/c/src/cfml-in-docker$ tree -a --dirsfirst -I "vendor|.git|.idea"
.
├── docker
│   ├── php-cli
│   │   ├── root_home
│   │   │   ├── .bash_history
│   │   │   ├── .bashrc
│   │   │   ├── .gitignore
│   │   │   └── .vimrc
│   │   └── Dockerfile
│   ├── .env
│   └── docker-compose.yml
├── test
│   └── php
│       └── SelfTest.php
├── .gitignore
├── LICENSE
├── README.md
├── composer.json
├── composer.lock
└── phpunit.xml.dist

5 directories, 14 files
adam@DESKTOP-QV1A45U:/mnt/c/src/cfml-in-docker$

The test this just this:

/** @testdox Tests PHPUnit install */
class SelfTest extends TestCase
{
    /** @testdox it self-tests PHPUnit */
    public function testSelf()
    {
        $this->assertTrue(true);
    }
}

And it passes:

root@18c5eabeb9f2:/usr/share/cfml-in-docker# composer test
> vendor/bin/phpunit --testdox
PHPUnit 9.5.4 by Sebastian Bergmann and contributors.

Tests PHPUnit install
it self-tests PHPUnit

Time: 00:00.002, Memory: 6.00 MB

OK (1 test, 1 assertion)
root@18c5eabeb9f2:/usr/share/cfml-in-docker#

In this instance I could not actually run the test before I implemented the work, for what should seem obvious reasons. However I followed the TDD mindset of just doing the least amount of work possible to make the test pass. I also monkeyed around with the test itself to see it fail if I had an assertion that was no good (I changed the argument to that assertion to false, basically).

The TDD lesson here is: I've set myself a case - "It needs a PHP container for running tests" - and only resolved that case before pausing and assessing the situation. I also didn't move any further forward than I needed to to address that case.

It returns a 200-OK from requests to /gdayWorld.html

Next I need an Nginx container running, and serving a test file. Well: I need the test for that.

/** @testdox Tests Nginx is serving html */
class NginxTest extends TestCase
{
    /** @testdox It serves gdayWorld.html as 200-OK */
    public function testReturns200OK()
    {
        $client = new Client(['base_uri' => 'http://cfml-in-docker.backend/']);

        $response = $client->get('gdayWorld.html');

        $this->assertEquals(200, $response->getStatusCode());
        $content = $response->getBody()->getContents();
        $this->assertMatchesRegularExpression("/^\\s*G'day world!\\s*$/", $content);
    }
}

Once again, I'll largely just list the added files here, and link through to source control:

adam@DESKTOP-QV1A45U:/mnt/c/src/cfml-in-docker$ tree -a --dirsfirst -I "vendor|.git|.idea"
.
├── docker
│   ├── nginx
│   │   ├── root_home
│   │   │   ├── .gitignore
│   │   │   ├── .profile
│   │   │   └── .vimrc
│   │   ├── sites
│   │   │   └── default.conf
│   │   ├── Dockerfile
│   │   └── nginx.conf
│   └── [...]
├── public
│   └── gdayWorld.html
├── test
│   └── php
│       ├── NginxTest.php
│       └── [...]
├── var
│   └── log
│       └── nginx
│           ├── .gitkeep
│           ├── access.log
│           └── error.log
└── [...]

12 directories, 25 files
adam@DESKTOP-QV1A45U:/mnt/c/src/cfml-in-docker$

The contents of gdayWorld.html should be obvious from the test, but it's just:

G'day world!

OK so that was all stuff I've done a few times before now. Next… Lucee

It has a Lucee container which serves CFML code via its internal web server

I'm kinda guessing at this next case. I'm gonna need to have a Lucee container, this is a cert. And I recollect Adobe's ColdFusion CFML engine ships with an wee stubbed web server for dev use. I can't recall if Lucee does too. I'm assuming it does. You can see how prepared I am for all this: I've not even RTFMed about the Lucee Docker image on DockerHub yet (I did at least make sure there was one though ;-). The idea is that there's a two-step here: getting the Lucee container up and doing "something", and after that, wire it through from Nginx. But that's a separate case.

Right so this is all new to me, so I'll actually list the files I've created. First the test:

/** @testdox Tests Lucee is serving cfml */
class LuceeTest extends TestCase
{
    /** @testdox It serves gdayWorld.cfm as 200-OK on Lucee's internal web server */
    public function testReturns200OK()
    {
        $client = new Client(['base_uri' => 'http://cfml-in-docker.lucee:8888/']);

        $response = $client->get('gdayWorld.cfm');

        $this->assertEquals(200, $response->getStatusCode());
        $content = $response->getBody()->getContents();
        $this->assertMatchesRegularExpression("/^\\s*G'day world!\\s*$/", $content);
    }
}

It's the same as the HTML one except I'm hitting a different host, and on port 8888 (I have now done that RTFM I mentioned, and found the port Lucee serves on by default).

The Dockerfile is simple:

FROM lucee/lucee:5.3

RUN apt-get update
RUN apt-get install vim --yes

COPY ./root_home/.bashrc /root/.bashrc
COPY ./root_home/.vimrc /root/.vimrc

WORKDIR  /var/www

EXPOSE 8888

It's more complex than it needs to be as I always like vi installed in my containers because I inevitably need it (this is prescient as it turns out: I definitely did need it).

And the relevant bit from docker-compose.yml:

lucee:
    build:
        context: ./lucee
    volumes:
        - ../public:/var/www
        - ../var/log/tomcat:/usr/local/tomcat/log
        - ../var/log/lucee:/opt/lucee/web/logs
        - ./lucee/root_home:/root
    ports:
        - "8888:8888"
    stdin_open: true
    tty: true
    networks:
        backend:
            aliases:
                - cfml-in-docker.lucee

That's mostly just me mapping logging directories back to my host for convenience-sake.

Currently my test file - gdayWorld.cfm - is just plonked in the web root, which is not where one would normally put CFML files (except the application entry point file I mean), but it'll do for now:

<cfset message="G'day world!">
<cfoutput>#message#</cfoutput>

And that's it. After rebuilding my containers and running the tests, everything passes now:

root@a034afe670d4:/usr/share/cfml-in-docker# composer test
> vendor/bin/phpunit --testdox
PHPUnit 9.5.4 by Sebastian Bergmann and contributors.

Tests Lucee is serving cfml
It serves gdayWorld.cfm as 200-OK on Lucee's internal web server

Tests Nginx is serving html
It serves gdayWorld.html as 200-OK

Tests PHPUnit install
it self-tests PHPUnit

Time: 00:00.028, Memory: 6.00 MB

OK (3 tests, 5 assertions)
root@a034afe670d4:/usr/share/cfml-in-docker#

It proxies .cfm requests from Nginx to Lucee

OK so Lucee is working. Painless. Now I need to tell Nginx about it. I have NFI how to do that… I hope Google and/or Stack Overflow does.

After some googling, my recollection that some sort of connector was needed to run between the web server and the application server seems outdated, and all I need to do is use proxy_pass from Nginx to the address Lucee has configured Tomcat to listen on (Lucee runs atop of Tomcat: it's basically a Java Servlet). I can never remember the syntax for this, but fortunately Nando Breiter has documented it in article "Using Nginx With ColdFusion or Lucee". It's also reminded me a few other cases I need to test for, but first the baseline. Well actually first the test:

/** @testdox It proxies a CFM request to Lucee */
public function testCfmReturns200OK()
{
    $client = new Client(['base_uri' => 'http://cfml-in-docker.frontend/']);

    $response = $client->get('gdayWorld.cfm');

    $this->assertEquals(200, $response->getStatusCode());
    $content = $response->getBody()->getContents();
    $this->assertMatchesRegularExpression("/^\\s*G'day world!\\s*$/", $content);
}

This is the same as the previous one except I'm using the Nginx website's host, and on port 80. Also note I've changed the name of the host to be cfml-in-docker.frontend not cfml-in-docker.backend. This is cosmetic, and just to distinguish between references to stuff happening on the network within the containers (called backend), and addresses browsed from the public-facing websites.

The implementation for this case is simply this, in the website config default.conf:

location ~ \.(?:cfm|cfc) {
    proxy_pass  http://cfml-in-docker.lucee:8888$fastcgi_script_name;
}

Adding this and restarting Nginx has that test passing, as well as not interfering with any non-CFML requests (ie: the other Nginx tests still pass).

This config has some shortfalls though. Well I say "shortfalls". Basically I mean it doesn't work properly for a real-world situation. More test cases…

It passes query values to Lucee

The test demonstrates this:

/** @testdox It passes query values to Lucee */
public function testCfmReceivesQueryParameters()
{
    $client = new Client([
        'base_uri' => 'http://cfml-in-docker.frontend/',
        'http_errors' => false
    ]);

    $response = $client->get('queryTest.cfm?testParam=expectedValue');

    $this->assertEquals(200, $response->getStatusCode());
    $content = $response->getBody()->getContents();
    $this->assertSame("expectedValue", trim($content));
}

and queryTest.cfm is just this:

<cfoutput>#URL.testParam#</cfoutput>

If I run this test I get a failure because the 500 INTERNAL SERVER ERROR response from Lucee doesn't match the expected 200. This happens because Lucee can't see that param value. Because Nginx is not passing it. Easily fixed.

location ~ \.(?:cfm|cfc) {
    proxy_pass  http://cfml-in-docker.lucee:8888$fastcgi_script_name$is_args$args;
}

It passes the upstream remote address to Lucee

As it currently stands, Lucee will be receiving all requests as it they came from Nginx, rather than from whoever requested them. This is the nature of proxying, but we can work around this. First the test to set expectations:

/** @testdox It passes the upstream remote address to Lucee */
public function testLuceeReceivesCorrectRemoteAddr()
{
    $directClient = new Client([
        'base_uri' => 'http://cfml-in-docker.lucee:8888/',
        'http_errors' => false
    ]);
    $response = $directClient->get('remoteAddrTest.cfm');
    $expectedRemoteAddr = $response->getBody()->getContents();

    $proxiedClient = new Client([
        'base_uri' => 'http://cfml-in-docker.frontend/',
        'http_errors' => false
    ]);

    $testResponse = $proxiedClient->get('remoteAddrTest.cfm');

    $this->assertEquals(200, $testResponse->getStatusCode());
    $actualRemoteAddr = $testResponse->getBody()->getContents();
    $this->assertSame($expectedRemoteAddr, $actualRemoteAddr);
}

And remoteAddrTest.cfm is just this:

<cfoutput>#CGI.remote_addr#</cfoutput>

This is slightly more complicated than the previous tests, but only in that I can't know what the remote address is of the service running the test, because it could be "anything" (in reality inside these Docker containers, if they're brought up in the same order with the default bridging network, then it'll always be the same, but we don't want to break these tests if unrelated config should happen to change). The best way is to just check what the remote address is if we make the call directly to Lucee, and then expect that value if we make the same call via the Nginx proxy. As of now it fails because Lucee correctly sees the request as coming from the PHP container when we hit Lucee directly; but it sees the request as coming from the Nginx container when using Nginx's proxy. No surprise there. Fortunately Nando had the solution to this baked into his blog article already, so I can just copy and paste his work:

location ~ \.(?:cfm|cfc) {
    proxy_http_version  1.1;
    proxy_set_header    Connection "";
    proxy_set_header    Host                $host;
    proxy_set_header    X-Forwarded-Host    $host;
    proxy_set_header    X-Forwarded-Server  $host;
    proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;     ## CGI.REMOTE_ADDR
    proxy_set_header    X-Forwarded-Proto   $scheme;                        ## CGI.SERVER_PORT_SECURE
    proxy_set_header    X-Real-IP           $remote_addr;
    expires             epoch;

    proxy_pass  http://cfml-in-docker.lucee:8888$fastcgi_script_name$is_args$args;
}

And if I restart Nginx: all good. One more issue to deal with…

It passes URL path_info to Lucee correctly

Something too few people know about, is there's an optional part of a URL between the script name and the query: path info. An example is: http://example.com/script/name/path/document.html/additional/path/info?queryParam=paramValue. That path is nothing to do with the script to be executed or where it's located, it's just… some extra pathing information for the script to do something with. It's seldom used, but it's part of the spec (RFC-3875, section 4.1.5). The spec says this:

The PATH_INFO variable specifies a path to be interpreted by the CGI script. It identifies the resource or sub-resource to be returned by the CGI script, and is derived from the portion of the URI path hierarchy following the part that identifies the script itself.

Anyway, from what I could see of what I have in the Nginx config, I suspected that we're not passing that on to Lucee, so its CGI.path_info value would be blank. A test for this is easy, and much the same as the earlier ones:

/** @testdox It passes URL path_info to Lucee correctly */
public function testLuceeReceivesPathInfo()
{
    $client = new Client([
        'base_uri' => 'http://cfml-in-docker.frontend/',
        'http_errors' => false
    ]);

    $response = $client->get('pathInfoTest.cfm/additional/path/info/');

    $this->assertEquals(200, $response->getStatusCode());
    $content = $response->getBody()->getContents();
    $this->assertSame("/additional/path/info/", trim($content));
}

And pathInfoTest.cfm is similarly familiar:

<cfoutput>#CGI.path_info#</cfoutput>

And as I predicted (although as we'll see below, not for the reasons I thought!) the test errors:

> vendor/bin/phpunit --testdox '--filter=testLuceeReceivesPathInfo'
PHPUnit 9.5.4 by Sebastian Bergmann and contributors.

Tests Nginx proxies CFML requests to Lucee
It passes URL path_info to Lucee correctly
  
   Failed asserting that 404 matches expected 200.
  
   /usr/share/cfml-in-docker/test/php/NginxProxyToLuceeTest.php:71
  

Time: 00:00.090, Memory: 8.00 MB


FAILURES!
Tests: 1, Assertions: 1, Failures: 1.
Script vendor/bin/phpunit --testdox handling the test event returned with error code 1
root@29840662fdf9:/usr/share/cfml-in-docker#

At this point I disappeared down a rabbit hole of irritation, as detailed in article "Repro for Lucee weirdness". There are two bottom lines to this:

  1. For reasons best known to [someone other than me], Lucee only handles path_info on requests to index.cfm, but not to any other .cfm file! This can be shown by changing that test by renaming pathInfoTest.cfm to index.cfm, and calling that instead.
  2. Actually Nginx already handles it correctly anyhow. In that the value is passed on already, and I don't need to do anything extra to make it work (as far as Nginx is concerned, anyhow).

I can fix the situation for pathInfoTest.cfm if I hack Lucee's web.xml file (this is down at line 4643):

<servlet-mapping>
    <servlet-name>CFMLServlet</servlet-name>
    <url-pattern>*.cfm</url-pattern>
    <url-pattern>*.cfml</url-pattern>
    <url-pattern>*.cfc</url-pattern>
    <url-pattern>/index.cfm/*</url-pattern>
    <url-pattern>/index.cfc/*</url-pattern>
    <url-pattern>/index.cfml/*</url-pattern>
</servlet-mapping>

I could slap a special mapping for it in there. But that's a daft way to deal with this. I'm going to just mark that test as "incomplete", and move on.

Thanks to Pete Freitag, Adam Tuttle, Zac Spitzer and Sean Corfield for putting me on the right direction for working out this particular "WTF, Lucee?" episode.


Speaking of "moving on", I said I'd get the code this far, but only progress onto the more CFML-oriented stuff if I still had will to live. Well Lucee has eroded that for now, so I'll get back to that part later, when I've stopped shaking my fist at the screen.

NB: this has become part of a series of articles, as things get more complicated, and require more effort on my part to achieve my end goal: Lucee/CFWheels/Docker series.

Righto.

--
Adam

Friday 16 April 2021

Repro for Lucee weirdness

G'day:

I'm just having to install Lucee on my machine (see the series of articles labelled with Lucee/CFWheels/Docker series), and have got its Docker version up and running, but I'm seeing some weirdness with it. I was just wondering if someone else could take the time to try a quick experiment for me, and report back.

  1. In a browser-accessible directory, save this code in index.cfm:
    <cfdump var="#{
        script_name = CGI.script_name,
        path_info = CGI.path_info,
        query = url
    }#">
    
  2. Browse that file as http://[your test domain, etc]/path/to/that/index.cfm. You should see something like:
  3. Browse that file as http://[your test domain, etc]/path/to/that/index.cfm/extra/path/info?param=value. You should see something like:

    Note how it's correctly extracting the path_info value.

Now repeat the exercise, except instead of index.cfm, call the file testPathInfo.cfm, and browse to that instead.

For me, this works as expected if I'm using index.cfm. But if I use anything else, I just get this error if I have additional path info in the URL:

Note how it's seeing the path_info as part of the script_name, rather than separating it out.

My Lucee install is a fresh one from the the Lucee Docker image on DockerHub. I am only using the built-in web server ATM. However this is stopping me from sorting out the proxy_pass from Nginx… I want to get this ironed out before I move onwards with that.

Also, if anyone fancied running the experiment on CF instead of Lucee, that would be good too. But I'm mostly interested in seeing if this is just me doing something daft (if so: I'm buggered if I know what!), or if there is an issue.

Further investigations after feedback

FWIW I bit the bullet and downloaded and installed ColdFusion. It handles this situation fine:

Also thanks to Sean, Pete and Adam's guidance below; they've identified the issue as being in Lucee's web.xml file:

    <servlet-mapping>
        <servlet-name>CFMLServlet</servlet-name>
        <url-pattern>*.cfm</url-pattern>
        <url-pattern>*.cfml</url-pattern>
        <url-pattern>*.cfc</url-pattern>
        <url-pattern>/index.cfm/*</url-pattern>
        <url-pattern>/index.cfc/*</url-pattern>
        <url-pattern>/index.cfml/*</url-pattern>
    </servlet-mapping>

So the way Lucee works is that only index.cfm (or variant) can have path_info. That's pretty weird.

I've also looked at the servlet spec, and one can only have the single wildcard in the url-pattern, so it's not possible to solve this as *.cfm/* etc. I find it odd that the servlet spec includes the path_info in the "URL" it checks for the pattern. It should only be the script_name as far as I can tell, but they do specifically use everything after the context (the first part of the URL, omitted for Lucee), up to but not including the query part of the URL. If I was a betting person, I'd say the intent here is that the pattern should be an entire subdirectory (so widgets/*), or a file type, based on extension (so *.myServlet). And the people writing the servlet spec didn't think that the file extension is not necessarily the last thing before the ? or the end of the URL.

Still: the spec is clear in how it works, and what Lucee is trying to do with it doesn't work. I suspect they have decided path_info is only for old-skooly human-friendly URLs like this: http://example.com/index.cfm/fake/friendly/url/path/here (as opposed to just http://example.com/actually/friendly/url/path/here/). I've not seen someone use URLs like that since the early 2000s, and they should not be encouraged anyhow.

Am gonna have a quick look at what ColdFusion does with those mappings…

How ColdFusion handles it

Adobe have cheated and seemed to have patched the URL matcher so it accepts two wildcards, so in web.xml it's got this sort of thing (for each file extension variant):

<servlet-mapping id="coldfusion_mapping_6">
    <servlet-name>CfmServlet</servlet-name>
    <url-pattern>*.cfm/*</url-pattern>
</servlet-mapping>

Gets the job done.

Cheers and I appreciate the help.

Righto.

--
Adam