# Try to get the value from the cache
my $result = $cache->get('key');
# If it isn't there, compute and set in the cache
if (!defined($result)) {
my $result = do_something_to_compute_result();
$cache->set('key', $result, $expiration_time);
}
This pattern is popular because it is easy to wrap around existing code, and easy to understand.
Unfortunately, it suffers from two problems:
Miss Stampedes: When a cache item expires at the specified time, any processes trying to get it will start to recompute it. If it is a popular cache item and if recomputation is expensive, you may get many recomputations for the same item, which is at best wasteful and at worst can bog down a server.
I originally recognized this problem while working on Mason's busy locking, but am obviously not alone in experiencing it. The term "miss stampede" comes from this memcached list discussion - definitely worth a read.
Recomputation Latency: When a cache item is recomputed, the client (whether that be a browser, command-line, whatever) has to wait for the computation to complete. Since caching keeps average latencies down, there is a tendency to ignore the unfortunate customer that gets stuck with one or more cache misses.
Here are some ways of tweaking the usage pattern above to address one or both of these problems. I've added the initials of the problems that each one addresses, and mentioned relevant features from CHI, if any.
Instead of specifying a single expiration time, specify a range of time during which expiration might occur. Then each cache get makes an independent probabilistic decision as to whether the item has expired. The probability starts out low at the beginning of the range and increases to 1.0 at the end of the range. What this means for popular cache items is that only one or a handful of gets will most likely expire at the same time.
CHI supports this with the expires_variance parameter. It may be passed to individual set commands or as a default for all sets. Personally, I plan to default it to 0.2 or so in almost all my caches.
Drawbacks: Since this is probabilistic, you get no guarantee of how well stampedes will be avoided (if at all), and you have to try to guess the right variance to use.
When a cache item expires, flag the item for a short time, either by upping its expiration time or by setting an associated value in the cache. Subsequent misses will see the flag and return the old value instead of duplicating the recompute effort.
CHI supports this with the busy_lock parameter, stolen from Mason. It works by temporarily setting the expiration time forward by the specified amount of time.
Drawbacks: Setting a busy lock involves a separate write. If you use this feature liberally, you'll double the number of write operations you do. Some backends will suffer from a race condition, a small window of time in which many processes may decide to recompute, before the first lock has been successfully set.
When a cache item expires, return the old value immediately, then kick off a recomputation in the background. This spares the client from the cost of the recompute.
This requires a non-traditional usage pattern, since the get and set are effectively happening as part of one operation. In CHI it will look like this:
my $result =
$cache->compute( 'key', sub { do_something_to_compute_result() },
$expiration_time );
CHI already has a working compute API, but doesn't yet know how to run things in the background. Coming soon.
Drawbacks: Requires a non-traditional and somewhat ugly code pattern; background processes are harder to track and debug.
Recompute cache items entirely from an external process, either when change events occur or when items approach their expiration time. Items never actually expire as the result of a client request. This is the most efficient and client-friendly solution, if you can manage it.
Drawbacks: Requires extra external processes (more moving parts). Code to recompute caches must be available from the external process, which can result in some unwanted code separation, API contortions, or repetition. It is also difficult to know which items to keep repopulating, and when exactly to recompute them.
Use a periodic external process to trigger events that will naturally utilize your caches (e.g. write a cron job that hits common pages on your website), but pass a special flag making items more likely to expire. This makes it less likely that expiration will occur during a real client request.
This is not yet supported in CHI, but the idea would be to add some kind of easily-accessible lever to temporarily view all expiration times as reduced. e.g.
# Reduction ends when $lex goes out of scope
my $lex = CHI->reduced_expirations(0.5);
Drawbacks: Requires extra external processes (more moving parts). Triggers and their run frequencies must be carefully chosen.
What other techniques have you used, and what success/failures have you had with them?
file: $CPAN/authors/id/J/JS/JSWARTZ/CHI-0.03.tar.gz
size: 62313 bytes
md5: ec828f2466ba266e11cd6d1dd5ca2913
CHI provides a unified caching API, designed to assist a developer in persisting data for a specified period of time. It is intended as an evolution of DeWitt Clinton's Cache::Cache package, adhering to the basic Cache API but adding new features and addressing limitations in the Cache::Cache implementation.
You might think of it as a fledgling "DBI for caching".
Driver classes already exist for in-process memory, plain files, memory mapped files and memcached. Other drivers such as BerkeleyDB and DBI will be coming soon. Fortunately, implementing drivers is fairly easy, on the order of creating a TIE interface to your data store.
Special thanks to the Hearst Digital Media group, where CHI was first designed and developed, for blessing the open source release of this code.
There's lots more in store for this module, so stay tuned! Feedback welcome here or on the Perl cache mailing list.
In addition, there must be CPAN modules that have interesting things to say but choose not to log at all, because they don't want to invent another logging mechanism or become dependent on an existing one.
This situation is pretty much the opposite of what I want when developing a large application. I want a single way to turn logging on and off, and to control where logs get sent, for all of the modules I'm using.
This being Perl, there are many fine logging frameworks available: Log::Log4perl, Log::Dispatch, Log::Handler, Log::Agent, Log::Trivial, etc. So why do CPAN modules eschew the use of these and invent their own mechanisms that are almost guaranteed to be less powerful?
A Common Log API
One thing to notice is that while the logging frameworks all differ in their configuration and activation API, and the set of features they support, the API to log messages is generally quite simple. At its core it consists of
I expect most CPAN modules would happily stick to this API, and let the application worry about configuring what's getting logged and where it's going. Therefore...
Proposed Module: Log::Any
I propose a small module called Log::Any that provides this API, with no dependencies and no logging implementation of its own. Log::Any would be designed to be linked by the main application to an existing logging framework.
A CPAN module would use it like this:
package Foo;
use Log::Any;
my $log = Log::Any->get_logger(category => __PACKAGE__);
$log->error("an error occurred");
$log->debug("arguments are: " . Dumper(\@_))
if $log->is_debug();
By default, methods like $log->debug would be no-ops, and methods like $log->is_debug() would return false.
As a convenient shorthand, you can use
package Foo;
use Log::Any qw($log);
to create the logger, which is equivalent to the first example except that $log is (necessarily) a package-scoped rather than lexical variable.
How does an application activate logging? The low-level way is to call Log::Any->set_logger_factory (better name pending) with a single argument: a subroutine that takes a log category and returns a logger object implementing the standard logging API above. The log category is typically the class doing the logging, and it may be ignored.
For example, to link with Log::Log4perl:
use Log::Any;
use Log::Log4perl;
Log::Log4perl->init("log.conf");
Log::Any->set_logger_factory
(sub { Log::Log4perl->get_logger(@_) });
To link with Log::Dispatch, with all categories going to the screen:
use Log::Any;
use Log::Dispatch;
my $dispatcher = Log::Dispatch::Screen->new(...);
Log::Any->set_logger_factory(sub { $dispatcher });
To link with Log::Dispatch, with different categories going to different dispatchers:
use Log::Any;
use Log::Dispatch;
my $dispatcher_screen = Log::Dispatch::Screen->new(...);
my $dispatcher_file = Log::Dispatch::File->new(...);
sub choose_dispatcher {
my $category = shift;
$category =~/DBI|LWP/ ? $dispatcher_file : $dispatcher_screen;
}
Log::Any->set_logger_factory(\&choose_dispatcher);
This API is a little awkward for the average user. One solution is for logging frameworks themselves to provide more convenient mixins, e.g.:
use Log::Dispatch; # this also defines Log::Any::use_log_dispatch
my $d = Log::Dispatch::File->new(...);
Log::Any->use_log_dispatch($d); # calls set_logger_factory for you
use Log::Log4perl; # this also defines Log::Any::use_log4perl
Log::Any->use_log4perl(); # calls set_logger_factory for you
set_logger_factory would be implemented so as to take effect on all existing as well as future loggers. Any $log objects already created inside modules will automatically be switched when set_logger_factory is called. (i.e. $log will probably be a thin proxy object.) This means that Log::Any need not be initialized by the time it is used in CPAN modules, and it allows set_logger_factory to be called more than once per application.
Promoting Use
For Log::Any to be useful, a substantial number of modules - especially major modules - would have to adopt its use. Fortunately, with its minimal footprint and standalone nature, authors should not find Log::Any a difficult dependency to add. Existing logging mechanisms, such as LWP::Debug and $DBI::tfh, could easily be converted to write *both* to their existing output streams and to Log::Any. This would preserve backward compatibility for existing applications, but allow new applications to benefit from more powerful logging. I would be willing to submit such patches to major module authors to get things going.
Feedback welcome. Thanks!