Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

schwern (1528)

schwern
  (email not shown publicly)
http://schwern.net/
AOL IM: MichaelSchwern (Add Buddy, Send Message)
Jabber: schwern@gmail.com

Schwern can destroy CPAN at his whim.

Journal of schwern (1528)

Wednesday January 26, 2005
08:52 AM

Some spam numbers.

[ #22894 ]

I recently retrained my spam filters (sa-learn) by the simple expedient of pulling down about three weeks worth of mail and hand sorting it. It came out something like this:

Ham: 960 messages ~6 megs
Spam: 4100 messages ~120 megs

Out of the ham there was about 50 I actually kept to read. The rest was mailing list threads I wasn't particularly interested in.

Realize this is *after* my mail is filtered through pobox.com's RBL-based filtering. That knocks out between 5000 to 10,000 messages a month (6700 in the last 30 days).

I always assume I get an uncommonly large amount of spam. My address is over six years old and is posted all over the Internet via mailing lists and Perl documentation. This is why I usually scoff when someone suggests "just use foo.com's built-in filtering" whenever my email toolchain has a hiccup.

Do other folks see this much crap?

PS If you ever need to hand filter a large amount of email, sorting by subject helps a lot.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • sorting by subject helps a lot

    Clever :-)

    Meanwhile, when do you plan on reading (and answering) those 50 messages? (I sent you an email, but I don't know if it got through...)

  • Do other folks see this much crap?

    Yep. Since January 1st I've got about 12,000 emails in my caughtspam folder. I really need to implement some kind of SMTP-time blocking mechanism.

  • I get about 2,000 spam messages a day, which includes a lot of the same messages sent to every address for The Perl Review as well as my Stonehenge address and the Panix address that I've had for 8 years and don't bother to obfuscate or munge (here it is spammers -> comdog@panix.com -> I won't see your message though).
  • I don't think 960 ham messages in three weeks is a common figure. I probably get a fifth of that.

    Going by your numbers, though, you have a spam:ham ratio of 10:1, and I'd say that yes, that's pretty common. It's certainly close to what I'm getting.

    • You must not subscribe to many lists. Schwern's number is only 45 ham messages a day, so you're only getting 9? I probably get 2000 ham messages in 3 weeks, maybe more, most of it mailing list traffic and some automated messages. Of course the vast majority of that gets filtered into folders and deleted with only a glance at the subject line. I doubt that 960 is out of the ordinary for geeks.
      • I lost my perspective here, put as 45mails/day it's not that much. I don't have much email traffic, but I do have a huge blogroll. Newsfeeds have the advantage of being a spam-free medium so far.
  • for i in ~/Maildir/.backup.2004.*/cur ~/Maildir/.spam.2004.*/cur ; do echo "$i: "`ls $i | wc -l` ; done
    /home/cjcollier/Maildir/.backup.2004.01/cur: 957
    /home/cjcollier/Maildir/.backup.2004.02/cur: 616
    /home/cjcollier/Maildir/.backup.2004.03/cur: 489
    /home/cjcollier/Maildir/.backup.2004.04/cur: 324
    /home/cjcollier/Maildir/.backup.2004.05/cur: 417
    /home/cjcollier/Maildir/.backup.2004.06/cur: 563
    /home/cjcollier/Maildir/.backup.2004.09/cur: 1872
    /home/cjcollier/Maildir/.backup.2004.10/cur: 595
    /home/cjcollier/Maildir
  • I get about 200 spam messages a day, but I don't have a domain of my own. This is just on my normal everyday email address. Just a few months ago it was "only" half that much.

    I fear that, when I go on holiday for a few weeks, my normal mailbox will just overflow. The sheer bulk of it is just too much for even the multi-megabytes mailbox I have at my disposal.