I'm hoping Peter Boughton or Ben Nadel might see this. Or someone else who is good @ regular expression patterns that I'm unaware of.
Here's the challenge...
Given this string:
Lorem ipsum dolor sit
I want to extract the leading sub-string which is:
- no more than n characters long;
- breaks at the previous whole word, rather than in the middle of a word;
- if no complete single word matches, them matches at least the first word, even if the length of the sub-string is greater than n.
I've come up with this:
// trimToWord.cfm
string function trimToWord(required string string, required numeric index){
return reReplace(string, "^((?:.{1,#index#}(?=\s|$)\b)|(?:.+?\b)).*", "\1", "ONE");
}
It works, but that regex is a bit hoary.
Here's a visual representation of it (courtesy of regexper.com), by way of explanation:
Anyone fancy improving it for me?
Here's some unit tests to run your suggestions through:
// TestCase.cfc
component extends="testbox.system.BaseSpec" {
function beforeAll(){
include "trimToWord.cfm";
variables.sample = "Lorem ipsum dolor sit";
}
function run(){
describe("Tests for trimToWord()", function(){
it("works when the trim point is smaller than the first word 'Lorem'", function(index){
for (var i=1; i <= 5; i++){
expect(
trimToWord(sample, i)
).toBe(left(sample, 5), "trimming @ #i#");
}
});
it("works when the trim point is between the first and second words 'Lorem ipsum'", function(index){
for (var i=6; i <= 10; i++){
expect(
trimToWord(sample, i)
).toBe(left(sample, 5), "trimming @ #i#");
}
});
it("works when the trim point is between the second and third words 'Lorem ipsum dolor'", function(index){
for (var i=11; i <= 16; i++){
expect(
trimToWord(sample, i)
).toBe(left(sample, 11), "trimming @ #i#");
}
});
it("works when the trim point is between the third and fourth words 'Lorem ipsum dolor sit'", function(index){
for (var i=17; i <= 20; i++){
expect(
trimToWord(sample, i)
).toBe(left(sample, 17), "trimming @ #i#");
}
});
it("works when the trim point is at the end of the string 'Lorem ipsum dolor sit'", function(index){
expect(
trimToWord(sample, 21)
).toBe(sample);
});
});
}
}
Cheers.
--
Adam