Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Matts (1087)

Matts
  (email not shown publicly)

I work for MessageLabs [messagelabs.com] in Toronto, ON, Canada. I write spam filters, MTA software, high performance network software, string matching algorithms, and other cool stuff mostly in Perl and C.

Journal of Matts (1087)

Monday May 14, 2007
04:38 PM

How my anti-spam job has changed

[ #33276 ]

My job as an anti-spammer has changed a number of times over the years. Initially of course it was actually coming up with ways to stop spam, back when SpamAssassin was immature, and nobody had heard of Bayes (it's a little known thing that I created one of the first email bayes systems back then, way before Paul Graham started writing about it).

At one point we had no quarantine system, so I had to go off and build that. Thankfully that project is now developed by a much larger team of programmers.

Then I re-wrote our anti-spam engine to be pluggable. I'd had a great experience playing with qpsmtpd and so I emulated a lot of the ideas in there for building a pluggable codebase that allows me to roll out a single "plugin" RPM and have it automatically "just work".

That led to a number of updated and new plugins, new techniques, and new ideas for stopping spam. A better signature system here, a better DNS query tool there, and various bits of header analysis here and there.

But ultimately that core system has now settled down. We infrequently update plugins (except for new heuristics, which is different), mostly just for bug fixes. Every now and then there's something new (such as broken date formats) but for the most part the core engine is stable.

What our role in the spam team mainly consists of now is building tools to help us identify more quickly what to signature, or help us find new heuristics to apply.

This seems like good progress to me. We have a stable core and we focus on improving our performance in updating the data that we pass to that core. This is a far better position to be in than having to continually update (and potentially break) the core engine.

I don't suspect core engine updates are gone, but now they are in many ways more interesting - esoteric and unusual bugs that we have to fix, or radically new features that we need ASAP. Fun times.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Fun times.

    I think “what a waste of human effort” is more like it… but what can you do.

  • it's funny, we've broadly gone the same way in open source-land with SA -- the engine was very stable for a while there, with most of the change going on in plugins and rules. (we need to accelerate the rate of rule development and publishing, though.)