Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

chromatic (983)

  (email not shown publicly)

Blog Information [] Profile for chr0matic []

Journal of chromatic (983)

Monday August 11, 2003
10:58 PM

Caching Design Decisions in Everything

[ #14050 ]

Everything is optimized for writing, so reading can be a little slow. I've been fixing this while trying not to ruin the features that make it so powerful.

Any good web application that has to serve more than one user per second and serves dynamic content has to have a good caching scheme, or at least spend ridiculous amounts of money on hardware. The classic rule-of-thumb tradeoff applies. To run faster, you'll probably have to use more memory.

(Some of my recent tricks have actually been designed to improve speed and reduce the memory footprint. Anytime you can take care of shared memory pages, do it! I think the parent process size is now larger, but much, much more is actually shared across children, so that the overall effect is incredibly better.)

The node cache was an earlier attempt to save memory and time. Any time the engine needed to fetch a node, it would go to the cache. The simple approach is to keep a cache in each child process, using an LRU scheme with a tunable maximum cache size. Because each child kept its own cache, each child could potentially cache the same nodes. Each child would pay the memory cost of caching that many nodes, too. Worse, there's no child affinity, so the best the cache could do is when several users were all accessing the same nodes multiple times.

My clever "let's make it a little faster without rewriting too much code" scheme, compil-o-cache, compiled dynamic content and saved it in otherwise-unused keys in cached nodes. For frequently accessed nodes, this would save the fetching, parsing, and compiling steps, at the expense of looking in the cache.

Of course, that eliminated one entire class of optimizations. Sharing hashes between children is not difficult, with some of the caching modules out there. Sharing actual code references is harder. Even if you can do it, at some point, it defeats the purpose, as you'll have to recompile on each cache fetch or keep a secondary cache, or something along those lines.

I've largely solved the need to store compiled code in the cache, so now we can consider other options for improving the cache.

One good option remains the caching modules. I've never used them, but there's a simplicity and appeal there. Keeping the cache in sync with updates going on in the database could be tricky, though. Another option is moving the caching to the database layer. We could use DBD::Proxy to connect to a layer around the database that caches nodes. That might mean parsing SQL, however, to determine which queries are requesting nodes and which queries are updating nodes. It also means running another process on the server, which isn't onerous as much as it is just one more thing to install and monitor.

It may turn out that we don't really need improved caching, though I really doubt that's the case.

These are the reasons I don't fall asleep well. Scary, isn't it? Suggestions that don't involve graham crackers and warm milk are welcome.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I tend to use a variation of this module:

    (use "anon" for username, no password).

    It's quite fast, not a lot of code, supports complex data structures, meta data and a couple of other things.

    (Develooper::DB::db_open is an Apache::DBI like dbh caching thing).

        - ask

    -- ask bjoern hansen [], !try; do();

  • caching (Score:3, Insightful)

    by inkdroid (3294) on 2003.08.12 8:37 (#23017) Homepage Journal
    This would make a nice topic for an article Chromatic.
  • I did a talk on caching modules at OSCON a couple of years back. I've been meaning to write it up into an article for a while now. The basic results of my benchmarking were that IPC::MM, BerkeleyDB (native locking), Cache::Mmap, and MLDBM::Sync are the fastest at sharing data. DBD::Proxy was a neat idea, but has terrible performance. You'd be better off just doing an extra fetch to check a cache table.
  • Got memcached []?

    Seriously, for caching of simple key-value pairs, it rocks. It's superfast and dead simple.