Blacklisting

Posted October 21st, 2004 @ 01:37pm by Erik J. Barzeski

I'm working on my Blacklist today. Lowering the URL limit to 3 has seemed to have some good effects, and so I am leaving that alone. Right now I'm replacing some blacklist strings: replacing four or five "health-insurance-from-us" type domains with the URLPattern "\binsurance\b," and adding some patterns like "\bsex\b" and "\blipitor\b".

I have a list of about 1000 entries - what's a good way to find more of these patterns so that I can condense several strings into one entry? I replaced 28 entries with "\bpoker\b" but have basically just been scrolling up and down the list to try to find common words.

Surely there's some software that can analyize a list and present some choices, no?

Update: I'm down to about 100 items, having simply deleted any URL that had been hit less than ten times. I'm sure this will result in a slight spike in comment spam in the coming month, but my three-URL limit may help as well.

The top spammy domain: us.com with over 1500 comment spam attempts.

6 Comments »

6 Responses to "Blacklisting"

| Reply Ken
Posted 21 Oct 2004 at 2:39pm #

MT-Blacklist maintains a list of comment spam keywords at

http://www.jayallen.org/comment_spam/blacklist.txt

It's generally updated several times a week, but the software's author just took a job at Six Apart so is moving and not as up-to-date as usually the case. I presume there will be even better integration of MT-Blacklist and MT in the near future.
| Reply Erik J. Barzeski
Posted 21 Oct 2004 at 3:17pm #

Ken, thanks, I know. That's not really what I seek, though.
| Reply Etan
Posted 21 Oct 2004 at 4:23pm #

Please don't blacklist "sex." My domain is TooMuchSexy.org, and all...
| Reply isle.yi.org
Posted 21 Oct 2004 at 6:59pm #

Have you considered leveraging DNSBL in your exploits? While intended for mail, the type that's bound to spam here is also sending out email, and DNSBL is a quick, simple protocol (really, really simple) that could be easy to integrate as a plugin.

If you do leverage it, I recommend DJB's dnscache to make things much nicer for the rest of us with regards to speed. Of course, if you're already using a locally caching BIND, you won't need it.
| Reply Erik J. Barzeski
Posted 22 Oct 2004 at 10:34am #

Etan, I didn't blacklist "sex." I blacklisted "\bsex\b" - which is different.
| Reply Chris
Posted 22 Oct 2004 at 6:43pm #

I blogged about this a couple of days ago. I just up and decided to be very aggressive and straight up kill most common spam words from referring URLs.

I manually went through the list and cherry-picked the terms I felt were most common. I suppose programmatically you can scan Jay Allen's list for dictionary words then dump each word into a hashmap with an occurrence -> word mapping then dump the map sorted by occurrence. I was thinking of doing something like that but I took the lazy route.

I had no false positives from my database of 1,000 or so legit comments, but your weblog is more popular with a more diverse audience so YMMV.

Comments RSS

NSLog();

The Weblog of Erik J. Barzeski

Blacklisting

6 Responses to "Blacklisting"

Leave a Reply to Ken

About This Blog

About Me

Contact Me…

Current Poll

Press My Buttons