Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Friday August 29, 2008
05:40 AM


[ #37302 ]

No, that's not its name and I've no idea what to call it, much less how to release it, but due to some serious production problems we've had in the past couple of days and lack of some obvious tools we've needed, I got fed up and started ripping apart log files and shoving them into an SQLite database. Here's the schema:

    CREATE TABLE status (
        id      INTEGER PRIMARY KEY,
        code    INTEGER NOT NULL,
        message VARCHAR(30) NOT NULL

    CREATE TABLE path (
        path VARCHAR(500) NOT NULL

    CREATE TABLE client (
        id      INTEGER PRIMARY KEY,
        host    VARCHAR(255) NOT NULL,
        agent   VARCHAR(255)

    CREATE TABLE log (
        id        INTEGER PRIMARY KEY,
        name      VARCHAR(255) NOT NULL,
        md5_hex   VARCHAR(32)  NOT NULL,
        last_line INTEGER      NOT NULL

    CREATE TABLE request (
        id             INTEGER PRIMARY KEY,
        request_date   DATE,
        response_time  INTEGER,
        method         VARCHAR(6),
        protocol       VARCHAR(10),
        bytes_received INTEGER,
        path_id        INTEGER,
        remote_user    VARCHAR(255),
        referer        VARCHAR(255),
        status_id      INTEGER,
        client_id      INTEGER,
        log_id         INTEGER

Of course, this being SQLite, we don't have real foreign key constraints and the schema is a bit of a hack. I could write triggers to fake foreign keys, but at fewer than 100 inserts per second (and yes, I'm using one big transaction), a 2 million line log file can take almost 6 hours to process. Adding those triggers will make this even worse. I'm probably going to have to switch to MySQL to (maybe MyISAM since this is really read-only).

One problem I faced was how to prevent someone parsing the same log file more than once. Well, you'll note that we have log.md5_hex up there. A log file to be processed must have at least five lines or else it's rejected. Otherwise, we calculate the MD5 hex value of the first five lines and use that as a signature. We also track the last line number processed. Throw a log file at it which it's already seen and it will only process new lines.

On the plus side, it's nice to be able to write SQL queries against my log file. Naturally, this was initially just a quick and easy hack. Regrettably, I now see other packages out on the 'net which do this. Sigh.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • If you can afford it, splunk looks good. []

    If not, I like []

    for log collecting and then use the hashed spread groups to insert into multiple MyISAM tables in parallel, or in multiple servers.

    If using a single server, a merge table on top gives you a nice global view for quick and dirty stuff. For more complex processing, you can use a simple map-reduce-style operation where you query your multiple mysql log databases in parallel and re

    life is short