Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

perrin (4270)

perrin
  (email not shown publicly)

Perrin is a contributor to various Perl-related projects like mod_perl, Template Toolkit, and Class::DBI. He is a frequent speaker at OSCON, YAPC, and ApacheCon, and a contributor to several perl-related books.

Journal of perrin (4270)

Friday April 28, 2006
02:06 PM

how large sites scale their databases

[ #29467 ]

I've been following Tim O'Reilly's series on how large sites scale their databases. Also, this article about topix.net. They seem to fall into two camps:

  1. Using flat-files, typically accompanied by lots of attitude about how much smarter they are for not using an RDBMS and frequent invocations of Google.
  2. Using MySQL, with replication to scale reads, and data partitioning to scale writes (users A-H on this cluster, I-P on that one...)

Amazingly, Craig's List uses MyISAM tables. I guess it's nearly all reads, but I just didn't think the locking approach used for MyISAM tables would hold up to traffic like that. A primary reason why I use InnoDB is the row-level locking and the multi-version concurrency system, which means that readers don't block writers.

Two interesting things here are that none of them use PostgreSQL, despite a few of them being fairly new, and that none of them have tried commercial offerings for database clustering, like the stuff IMB and Oracle sell.

In fact, I've never met anyone who had tried the Oracle or DB2 clustering. Even the people who have the money seem to avoid it. Can anyone offer any personal anecdotes about it? Does it work at all?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • In fact, I've never met anyone who had tried the Oracle or DB2 clustering. Even the people who have the money seem to avoid it. Can anyone offer any personal anecdotes about it? Does it work at all?

    Sure it works. The reason you don't hear a lot of anecdotes is that most of the people who can afford it aren't out telling the public how they do it.

    Oracle "clustering" is probably used way more in internal, critical infrastructure, than in external, disposable content servers.

    One of Oracle's advantages

  • Two interesting things here are that none of them use PostgreSQL, despite a few of them being fairly new

    I agree here, PostgreSQL is not popular with scaling large websites. It's strengths are not well suited to that task. It is not nearly as fast as MySQL on reads, and is not as friendly as MySQL to setup for web developers. It is the hidden P in LAMP (although my version of LAMP is Linux Apache Mod_perl Postgresql).

    PostgreSQL is best suited for applications which require higher than 10 to 1 ratio

  • I did find it very interesting to have technical reports of how these popular companies try to scale. The flat files are slightly more scary than MySQL, which is a known commodity at least. I am surprised that replication and backups are still so hard to pull off - it should be all plug and play already. I thought this was the future!