Subscribe to
Posts
Comments
NSLog(); Header Image

Regular Expression Removal of .php

Thanks to someone who will remain nameless (simply so he's not inundated with other annoying questions like my own), I found/replaced 1292 links in The Sand Trap's move to WordPress (and its new URL linking scheme).

In BBEdit, search for: ("http://thesandtrap[.]com/.+?)([.]php") and replace with: \1/".

If you're using Perl or PHP, the replacement is $1. The tricky part is using .+? instead of .+. .+? matches as little as possible, .+ matches as much as possible, so using .+ would match from the start of the first instance to the .php" of the final one. It's the sort of tricky thing that if you're only testing against a sample data file with one intended match, it seems to work OK. In the grand scheme of things, the minimally matching .+? and .*? features are relatively new inventions. Ten years ago patterns like this were harder.

I have a book on regular expressions, and as soon as I'm done reading an Aperture book I have, I plan to read it next. I've put off learning regex far too long.

3 Responses to "Regular Expression Removal of .php"

  1. You could also do: "(http://thesandtrap[.]com/[^"]+)[.]php"

    (Changed the .+? to a more traditional [^"]+)

    Since you're matching the opening quote, the closing quote has to be at the end of the match. So the URL cannot contain a ". It's not much different, but maybe a little easier to understand.

  2. I hope you're using something like my MT to WP redirect template instead of polluting your .htaccess with 1292 unnecessary lines. 🙂

  3. [quote comment="36162"]I hope you're using something like my MT to WP redirect template instead of polluting your .htaccess with 1292 unnecessary lines. :)[/quote]

    No need for "MT to WP Redirect" at all. Two lines in .htaccess to strip ".php" and "archives" from the URL. Though the old links are thus "valid," the least I could do is correct them internally. The above was about correcting the links within our own articles.

    I'm disappointed that we'll not be able to do the same "Related Articles" via pingback that we can in MovableType. The WP pingbacks don't have the same information as MT pingbacks - namely, the article name. They just show up as "The Sand Trap." But that's off-topic for this discussion.