Friday 16 April 2021

Repro for Lucee weirdness

G'day:

I'm just having to install Lucee on my machine (see the series of articles labelled with Lucee/CFWheels/Docker series), and have got its Docker version up and running, but I'm seeing some weirdness with it. I was just wondering if someone else could take the time to try a quick experiment for me, and report back.

  1. In a browser-accessible directory, save this code in index.cfm:
    <cfdump var="#{
        script_name = CGI.script_name,
        path_info = CGI.path_info,
        query = url
    }#">
    
  2. Browse that file as http://[your test domain, etc]/path/to/that/index.cfm. You should see something like:
  3. Browse that file as http://[your test domain, etc]/path/to/that/index.cfm/extra/path/info?param=value. You should see something like:

    Note how it's correctly extracting the path_info value.

Now repeat the exercise, except instead of index.cfm, call the file testPathInfo.cfm, and browse to that instead.

For me, this works as expected if I'm using index.cfm. But if I use anything else, I just get this error if I have additional path info in the URL:

Note how it's seeing the path_info as part of the script_name, rather than separating it out.

My Lucee install is a fresh one from the the Lucee Docker image on DockerHub. I am only using the built-in web server ATM. However this is stopping me from sorting out the proxy_pass from Nginx… I want to get this ironed out before I move onwards with that.

Also, if anyone fancied running the experiment on CF instead of Lucee, that would be good too. But I'm mostly interested in seeing if this is just me doing something daft (if so: I'm buggered if I know what!), or if there is an issue.

Further investigations after feedback

FWIW I bit the bullet and downloaded and installed ColdFusion. It handles this situation fine:

Also thanks to Sean, Pete and Adam's guidance below; they've identified the issue as being in Lucee's web.xml file:

    <servlet-mapping>
        <servlet-name>CFMLServlet</servlet-name>
        <url-pattern>*.cfm</url-pattern>
        <url-pattern>*.cfml</url-pattern>
        <url-pattern>*.cfc</url-pattern>
        <url-pattern>/index.cfm/*</url-pattern>
        <url-pattern>/index.cfc/*</url-pattern>
        <url-pattern>/index.cfml/*</url-pattern>
    </servlet-mapping>

So the way Lucee works is that only index.cfm (or variant) can have path_info. That's pretty weird.

I've also looked at the servlet spec, and one can only have the single wildcard in the url-pattern, so it's not possible to solve this as *.cfm/* etc. I find it odd that the servlet spec includes the path_info in the "URL" it checks for the pattern. It should only be the script_name as far as I can tell, but they do specifically use everything after the context (the first part of the URL, omitted for Lucee), up to but not including the query part of the URL. If I was a betting person, I'd say the intent here is that the pattern should be an entire subdirectory (so widgets/*), or a file type, based on extension (so *.myServlet). And the people writing the servlet spec didn't think that the file extension is not necessarily the last thing before the ? or the end of the URL.

Still: the spec is clear in how it works, and what Lucee is trying to do with it doesn't work. I suspect they have decided path_info is only for old-skooly human-friendly URLs like this: http://example.com/index.cfm/fake/friendly/url/path/here (as opposed to just http://example.com/actually/friendly/url/path/here/). I've not seen someone use URLs like that since the early 2000s, and they should not be encouraged anyhow.

Am gonna have a quick look at what ColdFusion does with those mappings…

How ColdFusion handles it

Adobe have cheated and seemed to have patched the URL matcher so it accepts two wildcards, so in web.xml it's got this sort of thing (for each file extension variant):

<servlet-mapping id="coldfusion_mapping_6">
    <servlet-name>CfmServlet</servlet-name>
    <url-pattern>*.cfm/*</url-pattern>
</servlet-mapping>

Gets the job done.

Cheers and I appreciate the help.

Righto.

--
Adam