A few days ago I promised that I'd post my 404 Search Function code. Here it is (below). Let me cover a few things first. Initially, I did a preg_match after having done a file(). The file() function returns an array. That's a bit costly, so I just wrap file() with an implode() to give me the whole string. I looked, and I haven't done this before so it's possible I overlooked it, but there doesn't seem to be a way to get a remote URL (the search results page) as a string in one function. Bummer.

At any rate, I'll revise this code as necessary. Submit your suggestions. Consider this code in the public domain, and use it as you see fit. If you feel like being nice, credit me. If you feel like being really nice, send me a case of Coke. If you feel like the code is crap and there's nothing to be nice about, do me the favor of telling me why.

Wrap this up in a 404.php file and set your .htaccess file to read "ErrorDocument 404 /404.php" or something.

$search_term = substr($REQUEST_URI,1);
$search_term = urldecode(stripslashes($search_term) );
$search_url = '';
$full_search_url = $search_url . $search_term;
$full_page = implode("", file($full_search_url) );
$search_string = '/<h3 class="title"><a href="([^"]*)"/';
$count = preg_match_all($search_string, $full_page, $matches);
if(1 == $count)
    header("Location: {$matches[1][0]}");
    header("Location: $full_page");

Got questions? Check out Vinay's source or Christopher Holland's source.

40 Responses to "404 Search Function Code"

  2. It seems to work perfectly. Thank you so much, this is awesome.

  4. Works nicely. Of course, now you realize you can add the logic in here to redirect people with old URLs to the new location ... right? 🙂

    Something I think I'll be doing on my own site.

  7. My only complaint, but I don't think you can control this.

    When someone does a search like this, MT's activity log shows the IP of the server hosting MT as doing the search since, technically, it is. It would be nice to somehow trick MT into posting the IP of the person doing the search as opposed to the host computer's IP.

  8. I predict this functionality will become extremely common on web sites. It's a no brainer.

  10. Sweet! Probably a good idea to stripslashes() on $searchterm, though: a search for what's up returns no results for what\'s up.

  11. You are right and I've added this as of, well, just now.

  12. Hi, nice work, a tried to use this for my site without MT 🙂

    One remark to the value $search_string:

    it matched nothing at me, only if the <, > and " are escaped. Then it worked 😉

  14. OK, this is one of the coolest things I've seen in a long time. Has now been implemented on my page 🙂

  15. Great idea! I never considered anything like this, though I am well aware of the neatness of's search function. Let me show you what I used, quite similar:

    $keyword = urlencode(substr($_SERVER["REQUEST_URI"], 1));


    As a side note, if anyone plans to make the searching-program, make sure you urldecode whateveer's being sent before you hatch into it. Try and search for something with spaces in on and you'll see what I mean. (Except of course foo bar as it leads directly to a very useful page, hehe -- coincidence? Oh well.)

    Best regards,

    Simon Shine

  20. Anyone noticed strange PHP interactions with this? If I make my 404 error page a PHP script, the REQUEST_URI when the script is evaluated points not the user's munged URL, but rather to my error page. For what it's worth, the virtual server approach my hosting service uses requires a full URL for the ErrorDocument to work; this might be the source of the problem.

  21. Turns out my problem is due to hosting service weirdness; I can only do an ErrorDocument with an absolute URL, and doing so loses the referer info.

  27. Etan, you can change one line in the code to get it to correctly search from your IP address.

    In the last line of code change:

    echo $full_page;


    header("Location: $full_search_url");

    Works for me.

    Also the script doesn't work really well if your blog directory isn't the base directory of the server, since the substr just chops the first slash off. I've edited it to handle blogs in subdirectories—or, more specifically, a certain subdirectory (in my case /mt/exordium/). I'm not exactly a PHP guru so there might be a better way—maybe just split the string by "/" and then just use the last item. Anyway, for now just change the first lines to read...

    $URI_prefix = "/mt/exordium/";

    $search_preterm = str_replace($URI_prefix,"",$_SERVER['REQUEST_URI']);

    //$search_term = substr($_SERVER['REQUEST_URI'],1);

    $search_term = stripslashes($search_preterm);

    (Also, I changed to PHP 4.3 compliant global variable names, so you may have to change it back to 'REQUEST_URI' if you don't have a new version of PHP)

  32. check out this RSSphp script

  33. Some security and scalability concerns have been raised about this solution at ScriptyGoddess. You might want to take a look at their discussion for some ideas on how to improve on this great idea.

  34. None of the points raised in the discussino at ScriptyGoddess are important. As Etan primarily points out, anyone wishing to "hurt" your server can hit up mt-search.cgi directly. Adding checks for load balancing, etc. are nice perks, but I'm also a fan of KISS. I'm already running a modified version of this script anyway.

  37. I need to Search the words from a website...I need to write a search function for it... plz help me out

  38. I last wrote about the 404 Search in February 2003 (both here and here. Since that time, I've been using the 404 search code quite heavily on every site with MovableType (or any other blogging package). It's undergone some improvements,...

  39. I prefer a 404 error page, I dont like to being redirected.
    Cheers an god work