Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

inkdroid (3294)

  (email not shown publicly)
AOL IM: inkdroid (Add Buddy, Send Message)
Yahoo! ID: summe_e (Add User, Send Message)
Jabber: inkdroid

inkdroid is a person, not a robot. however, inkdroid likes ink. inkdroid likes perl too.

Journal of inkdroid (3294)

Thursday October 24, 2002
11:52 AM

Bayesian spam filtering

[ #8569 ]

I've started using a spam filtering tool written by Gary Arnold in Perl. It is an application of Paul Graham's idea for using Bayesisan classification to filter spam.

All you need is a directory of spam emails and a directory of good emails to serve as corpi (sic) which are then used once at install time to build a BerkeleyDB of statistics on words in good and bad email.

Once you've got your BerkeleyDB you add a line to your .qmail file so that incoming messages are filtered through Gary's program, which causes spam to be redirected to a particular maildir.

The cool thing is that you can check your spam mailbox periodically, remove any false positives (if there are any), and then rebuild your BerkeleyDB with the mail. So if you cron the DB rebuild process Qmail will magically learn what is Spam from how you classify your email.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.