I recently retrained my spam filters (sa-learn) by the simple expedient of pulling down about three weeks worth of mail and hand sorting it. It came out something like this:
Ham: 960 messages ~6 megs
Spam: 4100 messages ~120 megs
Out of the ham there was about 50 I actually kept to read. The rest was mailing list threads I wasn't particularly interested in.
Realize this is *after* my mail is filtered through pobox.com's RBL-based filtering. That knocks out between 5000 to 10,000 messages a month (6700 in the last 30 days).
I always assume I get an uncommonly large amount of spam. My address is over six years old and is posted all over the Internet via mailing lists and Perl documentation. This is why I usually scoff when someone suggests "just use foo.com's built-in filtering" whenever my email toolchain has a hiccup.
Do other folks see this much crap?
PS If you ever need to hand filter a large amount of email, sorting by subject helps a lot.