Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Yeah, I think this highlights why bayesian analysis is better than rule based approaches. It's harder to tweak the message so it slips by. Also, it's my understanding that bogofilter can scaled much better on a high volume MTA than SpamAssasin...but I don't know this for a fact so perhaps I should keep quiet.
    • No it's true. SpamAssassin's rules are slow.

      On the flip side, bogofilter is a personal filter, so it's not going to perform that well on larger installations, which kind of breaks the point of being so much faster, doesn't it?
      • Speed is important when you're dealing with lots of messages.

        BTW, it is also important to reduce server load if you try implementing some automated way to handle spam.

        I've been thinking about writing something that will be run from an alias such as $USER-spam and $USER-ham (it will have to check the origin of the message: only the user himself can send messages to these addresses) and will classify the message as one of those using the user's database. Then, procmail or some other thing can compare messages against that database and add a header. The user will have to create the filtering rules on his MUA.

        It will help a lot and the user will have some way to interact with the program using only email.
        --
        -- Godoy.
        • With SpamAssassin 2.50 (nearly ready) you can have per-user bayesian databases. But that doesn't scale to a company with (say) 20,000 users. You can't expect everyone to train their systems.