The Danger of 404s with WordPress
Posted October 31st, 2006 @ 12:24am by Erik J. Barzeski
Since converting this blog to WordPress about a week ago, I've had problems with the site. Since the MT->WP upgrade took place shortly ((mere minutes)) after upgrading the server's software components (PHP, Perl, MySQL, Apache, etc.), I thought perhaps something had gone wrong in the process.
The problems? MySQL would spike to 25% or higher. Server loads would hit 10, 15, or even 25. Messages about failures to allocate memory would spew into my SSH terminal, my email client error logs, and all manner of things would go wonky and haywire.
When I was able to log on via SSH, I could kill apache for a few minutes and things would quiet down. I could restart MySQL and things would quiet down. But occasionally, the server was so far gone I couldn't do anything but ride it out for anywhere from five minutes to a few hours - I couldn't even SSH in to try to stop things.
Tonight I may have discovered the solution.
This account is run on a FreeBSD-based VPS account just like The Sand Trap .com. The Sand Trap runs a copy of vBulletin, and this site - by far the busiest within my account on this particular VPS server - now runs WordPress with WP-Cache to try to minimize MySQL queries. In other words, both servers are running a similar software package.
Running tail -f access_log
on each of the servers showed that both of the sites were about equally as busy as the other. Yet the site with the database-heavy features - the forum - wasn't seeing MySQL spikes. It was sitting at around 1% CPU. On the other hand, this site - NSLog(); - was still being hit by a bunch of spammers attempting to get at the old /mt/mt-tb.cgi
and /mt/mt-saysomething.cgi
scripts. These spammers accounted for a good portion of the "traffic."
MySQL usage hung out at around 10-20% CPU. Three to four times per day, it'd spike to 40% or higher. Server loads hung out in the 5-8 range and would spike to 15, 20, or higher. I thought perhaps the 404s (the missing MT scripts) were causing a lot of problems, so I replaced the 404.php file in my current WordPress theme with one that simply read <?="go away";?>
.
Not much changed. MySQL still hung out at nearly 20% CPU.
A few minutes later, it occurred to me that anything I could do to minimize the time spent handling access to the old MT scripts - since they were likely to be spammers and spammers alone - would be A Good Thing. So I created a RewriteRule that reads:
RewriteRule mt/mt - [F]
This just throws a 403 ("Forbidden" == "[F]") at anything trying to access "mt/mt". Since my cgi-bin directory was named "mt" and because every MovableType script begins with "mt," this is a very unique - and short - catch-all URL scheme.
Bam! Instant reduction to between 1% and 2% (with some soft spikes to 5%) for MySQL.
WordPress's 404 script must query the database and search through every row looking for a potential page, regardless of the output. It makes sense that it would, really. It also hurts. Spammers were spiking my usage by hitting hundreds of 404s per minute, causing lengthy query after lengthy query to back up. Those semi-random times when my server would crap out? They likely coincided with a dedicated spam attack. WP-Caching had no effect because 404s aren't cached (nor should they be, of course).
So, knock on wood, but there you have it. MovableType continues to screw me even after I've left it. As if the fact that their mt*.cgi scripts took 45+ seconds to execute themselves wasn't bad enough, now spammers thinking I'm still using that pile of crap are continuing to wreak havoc.
Quick Note on Doing This For Yourself
If you want to do this for yourself because you're routinely hit with 404s on a WordPress-powered blog, the steps are pretty easy. Simply figure out the unique path to the file, then incorporate it into the RewriteRule:
RewriteRule path/here - [F]
For example, if you're moving from MovableType to WordPress, you don't want to list "xmlrpc" because it's not unique - WordPress uses xmlrpc.php. Instead, "mt-xmlrpc.cgi" or "cgi-bin/mt" may be unique on your setup. After you've set up your Rule, test thoroughly.
P.S. Thanks to Steve
Thanks go out to Steve in tech support for Interland web.com for spending such a large amount of time with me on the phone this evening. I owe him an email in a few days to update him, but this blog entry will likely be the main "gist" of it.
Posted 31 Oct 2006 at 10:17am #
While reading this, I see your running VB on Sandtrap. Have you gone and optimized the VB/server settings with a little help from the VB community? There is one guru that will give the best settings with a few key data pieces from you to make the VB run like a Ferrari. Good to see you put the MT .cgi stuff to bed too... cgi used to cause me problems with infopop, but that was time ago.
Posted 31 Oct 2006 at 1:19pm #
This is great, thanks. We were having a raft of those requests, even 2 years+ since leaving MT for WordPress. Wow, a real improvement.