Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

perrin (4270)

perrin
  (email not shown publicly)

Perrin is a contributor to various Perl-related projects like mod_perl, Template Toolkit, and Class::DBI. He is a frequent speaker at OSCON, YAPC, and ApacheCon, and a contributor to several perl-related books.

Journal of perrin (4270)

Monday November 29, 2004
01:25 PM

missing the boat on performance

[ #22051 ]
I'm frequently surprised by the way best practices for good performance do not get picked up by people. A couple of weeks back at ApacheCon 2004, I listened to one of the SpamAssassin developers say that their SQL RDBMS storage was faster than their Berkeley DB storage. How could that be? My tests have always shown Berkeley DB to be significantly faster than the fastest query on MySQL (which, incidentally, is much faster than the same query on SQLite). I checked their code and there's the answer: it uses the slowest possible interface to Berkeley DB, the DB_File module with an external locking system. Using the BerkeleyDB module with direct method calls and letting BerkeleyDB manage locking would be many times faster.

In a similar vein, people are still recommending Cache::Cache or things based on IPC::ShareLite, when BerkeleyDB or Cache::FastMmap would be about ten times as fast. Hopefully my upcoming article based on my talk at ApacheCon will help point people in the right direction on that.

The most recent and most surprising was a big performance bug in Maypole that I discovered while helping Jesse Sheidlower with some performance tuning. People who have used Template Toolkit with mod_perl in a high-performance environment should know that you have to keep the TT object around between requests so that you don't blow the cache and recompile the templates on every hit. Maypole was throwing away the TT object. I gave Jesse a very small patch to fix this and he reported speedups of 250-500% on his application.

What's the lesson in all of this? Probably that being engaged on mailing lists like mod_perl and TT and sites like perlmonks.org has a tangible payoff in terms of knowing what the best practices are. Maybe also that we need to repeat them more often.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Did you use transactions when you tested SQLite for query speed? It's a lot faster if you do. By orders of magnitude.

    David
    • That would only speed things up for situations where multiple statements can be batched, right? My tests are for use as cache storage, so it's just a single read or write. I can't even keep a transaction open for reads, becasue that would block other people's writes. I did use "PRAGMA synchronous = OFF" and I see in the optimization FAQ that there are a couple of new things that have shown up since the last time I looked at it. I'll try another test soon.
  • George Schlossnagle posted some interesting performance benchmarks [schlossnagle.org] on reading from embedded databases.

    I was surprised to see how crappy SQLite was, especially in the wake of all the earlier raves I'd read about it compared to MySQL for SELECTs.

    -adam

    • SQLite's a full database though. I suspect it would be fairly nippy if you could go straight to its btree API.

      It also makes a difference how you execute the query. i.e. do you cache the statement handle, do you use bound variables for the output, etc.
      • In my tests I did use all the DBI speed tricks. There seem to be a couple of new SQLite-specific things I should try, but I suspect this (simple primary key lookups) is just a hard thing to compete with MySQL on.
        • Probably so. If you want to send me the code I'll take a look and see if there's any obvious problems.
          • I should clarify - I meant if there are any obvious problems in DBD::SQLite :-)
            • What I'm doing is releasing a cache abstraction module to CPAN (tentatively named Cache::Wrapper) which will include a DBD::SQLite version. Then I'll put out my benchmark code that uses it, and people can use it for their own tuning and comparisons.
    • I'm not quite sure I trust these, since GDBM is definitely not faster than BerkeleyDB when you use the right API.
  • The one thing you're missing is the history.

    SpamAssassin originally used Any_DBM (huge mistake, but that one predates me). Of the modules that Any_DBM uses, DB_File was the best choice for stability, portability, and performance reasons and it had the same interface as Any_DBM so we've stuck with it for a few versions. We're likely to eventually deprecate (or discourage) DB_File in favor of SDBM_File which is now a better option since other changes have made its deficiencies a non-issue. SDBM_File is f

    • Okay, so compatibility is ultimately more important than speed here. That makes perfect sense to me.

      I would like to try a BerkeleyDB version. It would be much faster than SDBM_File. If I get something working, I'll send in the patch.

  • That is definitely the key. Word needs to get out. People need to be told. Over and over, until they start to repeat it to others themselves.

    Good performance at macroscopic scale is all about understanding the interactions between layers of abstraction. Unfortunately that is the hardest to come by kind of knowledge, for a number of reasons. Good practices are therefore not going to be establish themselves; they need insistent advocates.

  • ...from being an option, is that is fails when used on an NFS filesystem. The errors are quite wierd, and do not disclose that this is the problem, so that could indicate that you have to be carefull when advocating ose of BerkeleyDB in unknown environments.