An interesting blog article fell in front of me this morning: "Capitalization for us Mc’s and Mac’s!", by Brian McGarvie. It mentions a UDF on CFLib.org which handles... well as per his blog title: captialising his name as "McGarvie" rather than "Mcgarvie" like other
capitalise()
functions might do.The UDF is thus:
function celticMcCaps(lastName) {
var capLastName = lCase(lastName);
if (left(lastName,2) eq "Mc") {
capLastName = uCase(left(lastName,1)) & lCase(mid(lastName,2,1)) & uCase(mid(lastName,3,1)) & lCase(right(lastName,len(lastName)-3));
return capLastName;
}
else if (left(lastName,3) eq "Mac") {
capLastName = uCase(left(lastName,1)) & lCase(mid(lastName,2,1)) & lCase(mid(lastName,3,1)) & uCase(mid(lastName,4,1)) & lCase(right(lastName,len(lastName)-4));
return capLastName;
}
else if (left(lastName,2) eq "O'") {
capLastName = uCase(left(lastName,1)) & "'" & uCase(mid(lastName,3,1)) & lCase(right(lastName,len(lastName)-3));
return capLastName;
}
else return lastName;
}
(thanks to Kyle MacNamara for submitting it, btw).
I had a look at that, and thought "that's a lot of logic when all we're doing is string manipulation".
I have to admit I didn't spot the fact it handles the "O'" prefix at first, and very quickly came out with this:
function celticMcCaps(name){
reReplaceNoCase(name, "^([M])([a]?c)([a-z])(.*)$", "\U\1\E\L\2\E\U\3\E\L\4\E", "ONE")
}
Which does 2/3rds of the trick. Then when writing this article I spotted the "O'" handling, so revised it to this:
function celticMcCapsRevised(name){
return reReplaceNoCase(name, "^([MO])((?:[a]?c)|')([a-z])(.*)$", "\u\1\L\2\E\u\3\L\4\E", "ONE");
}
The trick to all this is regular expression replacements can perform case-conversion.
\u
and \l
will convert the next character to their respective cases; \U
and \L
will convert all subsequent characters to their respective cases, until a \E
is encountered. So I use \u
to upper-case the first letter, plus the one after the prefix, and \L
to lowercase the rest.Running a test compare on this and the old one suggests it covers the same ground:
writeOutput('<table border="1"><thead><tr><th>Value</th><th>Original function</th><th>Revised function</th></tr></thead><tbody>');
for (name in [
"cameron", // control
"CAMERON", // control
"Cameron", // control
"Oswald", // control
"oswald", // control
"OSWALD", // control
"McGarvie", // already OK
"MacDonald", // already OK
"O'Shea", // already OK
"Mcgarvie", // should change
"Macdonald", // should change
"O'shea", // should change
"mcgarvie", // should change
"macdonald", // should change
"o'shea", // should change
"MCGARVIE", // should change
"MACDONALD", // should change
"O'SHEA" // should change
]){
writeOutput("<tr><td>#name#</td><td>#celticMcCaps(name)#</td><td>#celticMcCapsRevised(name)#</td></tr>");
}
writeOutput("</tbody></table>");
(I'm in a rush today, so didn't bother with TDD... oops!)
This outputs:
Value | Original function | Revised function |
---|---|---|
cameron | cameron | cameron |
CAMERON | CAMERON | CAMERON |
Cameron | Cameron | Cameron |
Oswald | Oswald | Oswald |
oswald | oswald | oswald |
OSWALD | OSWALD | OSWALD |
McGarvie | McGarvie | McGarvie |
MacDonald | MacDonald | MacDonald |
O'Shea | O'Shea | O'Shea |
Mcgarvie | McGarvie | McGarvie |
Macdonald | MacDonald | MacDonald |
O'shea | O'Shea | O'Shea |
mcgarvie | McGarvie | McGarvie |
macdonald | MacDonald | MacDonald |
o'shea | O'Shea | O'Shea |
MCGARVIE | McGarvie | McGarvie |
MACDONALD | MacDonald | MacDonald |
O'SHEA | O'Shea | O'Shea |
All good?
This just demonstrates that when one is manipulating text... using regular expressions is probably the place to start, before writing a bunch of string-manipulation logic.
And also - from a TDD perspective - this would cut down the number of tests from four (one for each branch of the logic) to one. Obviously I'd still run an "eyeball" test like the one I wrote above.
Anyway... that's it. Unless anyone spots any shortfalls in the revised approach, I might update the UDF on CFLib.
I'm hoping Peter Boughton reads this and sets me straight about any dodginess in my regex. If you think Ben Nadel knows a thing or two about regular expressions (and, hey, he does), then he seems like a journeyman compared to Peter, who is a true regex guru.
--
Adam