Posted December 11th, 2002 @ 03:04pm by Erik J. Barzeski
Spam. It's not just for dinner anymore… it's for dinner, lunch, breakfast, and, if my case is considered normal (it's not), it's for while you're sleeping. I receive about 350 emails per day. A little under half are spam. If I had to sort through spam at a pretty fast rate of one second per spam, I'd lose about three minutes per day. Multiply that out by a year, and all of a sudden I'm down about a thousand minutes. In hours, that's an entire waking day (plus some - 18.25 hours to be exact)!
For several months now I've inspected spam that's come in and added the domain name or address to a list on my server that blocks mail on all of my accounts (and anyone else who has an account on my server). I toyed wit the idea of doing a SpamAssassin type thing. Finally, I settled on something called SpamSieve. I was actually prompted to pick it up because its author, Michael Tsai, joined my software company, Freshly Squeezed Software.
SpamSieve works on Bayesian filters, something users of Mac OS X's "Mail" app got for free in Jaguar (10.2). The basics (actually, quite a bit more) are explained in this Paul Graham article, which is pretty widely linked to already.
The problem with the blocking system is that I'll always be taking the time to add spammers to the list. At a cost of about fifteen seconds at my fastest to block one spammer, sure it may save me five or ten seconds (20-25 spams) over the life of the filter, but that's not an exceedingly great return. SpamSieve and Bayesian filters require me to do nothing other than re-marking incorrectly filtered messages (fals negatives and positives). Plus, it allows ALL mail to get through. So perhaps if some day someone at netscape.net (one of the domains I was previously blocking) sends me legitimate mail, I'll receive it. Even if it's tagged as spam (I suspect most emails from netscape.net will be), it'll be sitting in my "• Junk" folder waiting for me to retrieve it.
Some people trust Bayesian filters so much that they just let it delete all spam. I've got 62 false positives out of about 1300 messages right now. 845 good, 512 spam. Why not more? Because some of my other mail filters come in first, to put mail into other folders in my mail client. Only mail that doesn't match some of those rules hits SpamSieve. So instead of sp two or three minutes per day "dealing with" spam, I now spend about 15 seconds per day looking to see whether I have any false positives (about one a day) and re-marking false negatives and positives.
SpamSieve costs $20. Mail is free, of course, if you want to use that client. What'd $20 get me? Well, about 16 hours per year. At my biling rate, let me tell you: that's a steal.