Thursday June 10, 2004
fighting foreign spam
I've been getting an increasing amount of foreign language spam lately - foreign as in german, not the typical asian variants. ordinarily, I don't notice that my spam volume is increasing, except that these have been getting past my filter and ending up in my inbox, I suspect because I haven't taught my filters to pick up on the non-english tokens.
do people with
.de addresses get more native language spam?
this got me thinking - why do I have to train my filters at all? I mean, certainly there is enough spam (english and non-english) floating around the world that a decent corpus could be assembled, regularly added to, and made available. this would have a number of advantages, like a larger corpus for more accurate results. it would also enable SA users to react to new spam forms more quickly than they could on their own - users contribute spam consistently and download a new database regularly and *poof* you have one killer spam-fighting machine.