Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Matts (1087)

Matts
  (email not shown publicly)

I work for MessageLabs [messagelabs.com] in Toronto, ON, Canada. I write spam filters, MTA software, high performance network software, string matching algorithms, and other cool stuff mostly in Perl and C.

Journal of Matts (1087)

Thursday June 05, 2003
11:27 AM

Bzzzz

[ #12634 ]

I feel like I have dropped off the public face of this planet for my latest set of deadlines. I've all but stopped my contributions to open source, including all my modules and AxKit too. This does not feel good, but something had to give and those weren't giving anything back to me so they were the first thing to go.

This week I am supposed to finish this quarantine system. Unicode bugs forced me to have to admit that it won't get done. Why can't email systems generate RFC compliant emails yet??? Ugh.

Beyond that I have to face up to the fact that I will probably have to lump the database into MS SQL Server for rollout. This will of course cause delays, because I've made PostgreSQL specific assumptions in the DB layer. Luckily there is only ONE layer there to fix. We have to go with MS SQL Server not just because it's the company standard, but because this is a Very Large database, and so we need 24/7 support, which we only have with SQL Server.

I am still concerned about MS SQL Server handling the job. I am very sold on Pg's locking model (readers don't wait for writers, and writers don't wait for readers) compared with MS SQL's (readers and writers wait for each other). But databases aren't my day job, and our DB guys tell me that it can cope. I guess I'll find out for sure soon enough.

The general scaling of this project is also looking "interesting" for various values of "interesting". We'll have about 3 million records a day going into the database (and a similar number coming out) at launch. We'll be storing 30 days worth, so that's 90m rows. Not unreasonable.

But spam is doubling every 2 months at the moment. So in 13 months that will be 9 billion rows. Pretty much un-queryable at that point. You have to change the architecture. So this will be a fairly short-lived design unless the spam increase slows down a lot.

Anyway, my brane hurts thinking about it. But at least it's an interesting problem to have.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • But spam is doubling every 2 months at the moment. So in 13 months that will be 9 billion rows. Pretty much un-queryable at that point. You have to change the architecture. So this will be a fairly short-lived design unless the spam increase slows down a lot.

    If spam is currently 50% of e-mail (by number of messages), and it doubles every 2 months, then in 1 year's time 98% of e-mail will be spam. I can't see that this is sustainable - if spammers keep on doubling their productivity, e-mail infrastructure i

    • Well I predict that it simply can't. And I predict that the thing that will stop it getting to that point is strong legislation. Slap a few spammers into prison and you'll see spam suddenly slowing down to processable rates again.
      • Slap a few spammers into prison

        In practical terms how do you define an offence that spammers are committing? As far as I can tell, it's only recently that they've started commiting (existing) criminal offences to spam, by using zombies on compramised machines as mail relays. Prior to that, were they only breaching contracts by sending spam in violation of terms of use? Are the spammers physically located in countries that will pass effective anti-spam legislation? And can they be identified given that ofte

        • Spam is a form of denial of service attack on a network. I don't see the fact that it's advertising changing that fact.

          To find a spammer just follow the money. You need a subpeona to do that.