Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Alias (5735)

  (email not shown publicly)

Journal of Alias (5735)

Wednesday July 21, 2010
09:27 PM

Profiling your website while live on a production cluster

[ #40457 ]

As I mentioned in my last post, by day I wrangle a large web application that occasionally verges on being too complex to for mere humans to understand.

Curiously, because it is private and the income rate is high (we probably average something like $5 in gross turnover per page view) we don't have to deal with a lot of servers to deliver it, by internet standards anyway.

But the application is still way too big to profile easily by normal methods, and certainly on production it's way too heavy, even if we applied targeted profiling using something like Aspect::Library::NYTProf.

Between the web servers, transaction server, database, search engine and cache server, we are probably only dealing with 12 servers and 30 CPUs. Of course, these servers are all horribly expensive, because they are all server-virtualised, network-virtualised, doubly redundant (high-availability + disaster-recovery) and heavily monitored with high end support contracts.

One of our most sensitive issues is database load.

We have a ton of database tables (about 200) and lots of medium sized queries running across them. One of our main failure modes is that some deep change to code boosts the quantity of some significant query, which stresses the database enough to cause contention and lock-storms, leading to site failure.

Complicating things, big parts of some pages are embedded in other pages. So attributing load and lag to one part of the website, or to Oracle, is tricky and hard to benchmark in advance (although we do load test the main load paths to catch the worst cases).

For a long time, we've had a mechanism for zoned profiling the production site, so we can allocate wallclock costs to different general areas of the site.

But it is fragile and unreliable, requiring perfect maintenance and every developer to remember to write this kind of thing everywhere.

# Embed a foo widget in the current page

Since you don't know this profiling system exists unless you've seen it somewhere else in the code before, and it's hard to care about something that is orthogonal to the problem you are actuall solving, this mechanism has degraded over time. While we still get some pretty cacti graphs showing load breakdown, they are highly unreliable and you can never be entirely sure if the load attribution is correct.

This kind of performance monitoring as a "cross-cutting concern" is a textbook use case for Aspect-Oriented Programming, and so in our Christmas 2009 code freeze window I set about trying to rewrite the push/pop_timing profiler using

Unfortunately, turned out to be a bit too slow and naive for my needs. But after a 6 month delay to do a 90% rewrite of, I now finally have something sophisticated enough (and fast enough) to meet my needs.

So today I'm releasing the shiny new Aspect::Library::ZoneTimer, which will serve as the main plank of our new production profiling system.

The idea behind ZoneTimer is to define each performance zone as a pointcut. The aspect will track the movement of your code across each zone boundaries and build a running total of the exclusive time spent in each performance zone.

When your program exits the top-most performance zone, the totals will be sent to a handler function.

A typical use of the ZoneTimer aspect looks something like this

use Aspect;
aspect ZoneTimer => (
    zones => {
        main      => call 'MyProgram::main' & highest,
        search    => call 'MyProgram::search',
        widgets   => call qr/^MyProgram::widget_.*/,
        templates => call 'MyProgram::render',
        dbi       => call qr/^DB[DI]::.*?\b(?:prepare|execute|fetch.*)$/,
        system    => (
            call qr/^IPC::System::Simple::(?:run|runx|capture)/
            call 'IPC::Run3::run3'
            call qr/^Capture::Tiny::(?:capture|tee).*/
    handler => sub {
        my $top       = shift; # "main"
        my $start     = shift; # [ 1279763446, 796875 ]
        my $stop      = shift; # [ 1279763473, 163153 ]
        my $exclusive = shift; # { main => 23123412, dbi => 3231231 }
        print "Profiling from zone $top\n";
        print "Started recording at " . scalar(localtime $start) . "\n";
        print "Stopped recording at " . scalar(localtime $stop)  . "\n";
        foreach my $zone ( sort keys %$exclusive ) {
            print "Spent $exclusive->{$zone} microseconds in zone $zone\n";

This example breaks out the cost of a typical small web application into a general zone, a special zone for the search page, and then splits the costs of generating widgets, rendering the HTML template, waiting for the database, and making calls to the underlying system.

Start and stop times are returned as a two element array exactly as returned by C, and the exclusive zone totals are returned as integer microseconds (all math is done in these integer microseconds to prevent floating point corruption).

The use of Aspect allows us to easily mix special cases via the pointcuts, such as the use of "highest" which makes sure that "main" only matches the first time it is seen, and any widget which that does a re-entry into main still has that cost attributed to the widget. We've also hooked into multiple system call modules to measure system call cost, because we know different modules our program consumes will use different methods for interacting with the system.

While the handler I've shown here will just print out a summary of the call, in our environment at work the profile report handler will format the exclusive times into a special log message and then send it via the super-quick and non-blocking Log::Syslog::Fast module to the UDP localhost syslog service, where it is mirrored to disk for debugging use, and then forwarded on to our main company-wide syslog server.

On our setup, we can then use an excellent commercial log analysis product called Splunk (limited free version also available if you want to try it out) to do our tracking and trending of the performance for the entire application across the entire cluster.

The nicest thing about the Aspect system is that it scales the performance cost and complexity risk directly to the complexity of the use case you are using it for.

If it turns out that the Aspect hooks are too intrusive and causing a performance drain, or accidentally causing some weird edge case behaviour, you can simply turn off that the Aspect and restart the servers and the performance penalty just vanishes.

Maintenance of the profiler zones is really easy, because they are all listed clearly in one place.

Depending on your use case, you could even define the performance zones in your website configuration, and then adjust the profiler zone boundaries directly on production (although I'm not sure I'd want to do that in our specific case, since we're fairly paranoid about untested code going into production).

This ZoneTimer is just the first of several Aspect-based tools I plan to release over the next few months. If you get a chance to try out any of these new toys, I'd love to hear feedback on how well they work for you.

In addition to the value of the ZoneTimer aspect itself, one other thing that some people might find of interest is just how little code it took to get this all working.

The entire Aspect::Library::ZoneTimer was implemented in about 80 lines of code, which you can see at y/

This is small enough that you could just clone the module and tweak it for your own use case, if you wanted to creat a new and tricky customised profiler.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.