Blacklisting

Posted October 21st, 2004 @ 01:37pm by Erik J. Barzeski

I'm working on my Blacklist today. Lowering the URL limit to 3 has seemed to have some good effects, and so I am leaving that alone. Right now I'm replacing some blacklist strings: replacing four or five "health-insurance-from-us" type domains with the URLPattern "\binsurance\b," and adding some patterns like "\bsex\b" and "\blipitor\b".

I have a list of about 1000 entries - what's a good way to find more of these patterns so that I can condense several strings into one entry? I replaced 28 entries with "\bpoker\b" but have basically just been scrolling up and down the list to try to find common words.

Surely there's some software that can analyize a list and present some choices, no?

Update: I'm down to about 100 items, having simply deleted any URL that had been hit less than ten times. I'm sure this will result in a slight spike in comment spam in the coming month, but my three-URL limit may help as well.

The top spammy domain: us.com with over 1500 comment spam attempts.

6 Comments »

6 Responses to "Blacklisting"

| Reply Ken
Posted 21 Oct 2004 at 2:39pm #

MT-Blacklist maintains a list of comment spam keywords at

http://www.jayallen.org/comment_spam/blacklist.txt

It's generally updated several times a week, but the software's author just took a job at Six Apart so is moving and not as up-to-date as usually the case. I presume there will be even better integration of MT-Blacklist and MT in the near future.
| Reply Erik J. Barzeski
Posted 21 Oct 2004 at 3:17pm #

Ken, thanks, I know. That's not really what I seek, though.
| Reply Etan
Posted 21 Oct 2004 at 4:23pm #

Please don't blacklist "sex." My domain is TooMuchSexy.org, and all...
| Reply isle.yi.org
Posted 21 Oct 2004 at 6:59pm #

Have you considered leveraging DNSBL in your exploits? While intended for mail, the type that's bound to spam here is also sending out email, and DNSBL is a quick, simple protocol (really, really simple) that could be easy to integrate as a plugin.

If you do leverage it, I recommend DJB's dnscache to make things much nicer for the rest of us with regards to speed. Of course, if you're already using a locally caching BIND, you won't need it.
| Reply Erik J. Barzeski
Posted 22 Oct 2004 at 10:34am #

Etan, I didn't blacklist "sex." I blacklisted "\bsex\b" - which is different.
| Reply Chris
Posted 22 Oct 2004 at 6:43pm #

I blogged about this a couple of days ago. I just up and decided to be very aggressive and straight up kill most common spam words from referring URLs.

I manually went through the list and cherry-picked the terms I felt were most common. I suppose programmatically you can scan Jay Allen's list for dictionary words then dump each word into a hashmap with an occurrence -> word mapping then dump the map sorted by occurrence. I was thinking of doing something like that but I took the lazy route.

I had no false positives from my database of 1,000 or so legit comments, but your weblog is more popular with a more diverse audience so YMMV.

Comments RSS

About This Blog

This blog contains posts, all written by me, and comments, only some of which were written by me.

About Me

About Me
Disclaimer
The Sand Trap

Contact Me…

… via Email
firstname@lastname.com
(Think about that one a little.)

… via IM
AIM: iacas
MobileMe: iacas
flickr: iacas
Yahoo: iacas
Twitter: iacas
MSN: iacas@hotmail.com
Xbox Live: erikjb
Google Talk: erikjb@gmail.com
ICQ: 8186546

Don't email me at any of the above IM addresses or I may never see the email. I use these accounts primarily for chat (IM) only.

… on Facebook
http://facebook.com/iacas
Current Poll
Press My Buttons

Donate Life: Because if you're not using your body parts, someone else can.

NSLog();

The Weblog of Erik J. Barzeski

Blacklisting

6 Responses to "Blacklisting"

Leave a Reply

About This Blog

About Me

Contact Me…

Current Poll

Press My Buttons