Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

grantm (164)

grantm
  (email not shown publicly)
http://www.mclean.net.nz/

Just a simple [cpan.org] guy, hacking Perl for fun and profit since way back in the last millenium. You may find me hanging around in the monestary [perlmonks.org].

What am I working on right now? Probably the Sprog project [sourceforge.net].

GnuPG key Fingerprint:
6CA8 2022 5006 70E9 2D66
AE3F 1AF1 A20A 4CC0 0851

Journal of grantm (164)

Sunday August 19, 2007
08:23 PM

Wellington.PM get social

Dave and his wife Gill were passing through Wellington on their vacation. So a group of us (Andy, Matt, Michael, Sam, my wife Anna and I) met up for a drink and dinner. And a very pleasant evening it was too. Interestingly it seems that London.PM meetings are predominantly social whereas Wellington.PM meetings have been pretty much exclusively technical. I'd be keen to attend more social events of a Wellington.pm nature - I'd just prefer someone else organised it :-).

If you're passing through Wellington, drop me a line and we'll use you as an excuse :-).

Wednesday August 15, 2007
07:16 PM

Nokia Batteries

A minor clarification to help you identify potentially faulty batteries.

Friday August 03, 2007
06:27 PM

More blind driving

Schwern's blind driver analogy struck a chord with me. I'd argue that the analogy could also be turned completely around, with the end-user in the driver's seat. Unfortunately, more often than not they won't even have someone in the back seat shouting directions. Their 'navigation' will consist of zigzagging through menus, links and buttons trying to decipher the available options find a path to their intended destination.

Unlike Adrian I'm not particularly enamoured with Don's Norman's writing - I've found it to be long on anecdotes and short on practical advice or solutions.

One fascinating essay I stumbled upon recently is Bret Victor's Magic Ink - Information Software and the Graphical Interface. For anyone interested in the interface between people and computers, this is a 'must read'. Be warned though it's quite long.

Thursday August 02, 2007
04:19 AM

Infinite Loop?

Here's some behaviour that surprised me. I would expect this program to loop forever:

#!/usr/bin/perl

use strict;
use warnings;

my $file1 = './test_file_1';
my $file2 = './test_file_2';

my $count = 1;
while(1) {
    unlink($file1);                      # create first file
    open(FILE, ">$file1");
    close(FILE);

    my $t0 = (stat($file1))[9];          # wait
    while(time() <= $t0) {
        sleep 1;
    }
    my $t1 = time();

    unlink($file2);                      # create second file
    open(FILE, ">$file2");
    close(FILE);

    if((stat($file2))[9] <= (stat($file1))[9]) {
        die sprintf(
            "Timestamps: %u %u  t0: %u t1: %u\n",
            (stat($file2))[9], (stat($file1))[9], $t0, $t1
        );
    }
    printf("Loop count = %u\n", $count++);
}

Given that test_file_2 is not created until the system clock is greater than the modification time on test_file_1, I would expect that the die statement would never be reached. In fact, it fails for me on multiple systems after running for a small number of minutes.

My test systems are all running Linux, and using local filesystems. I have seen this sort of funky effect when using network filesystems, presumably due to clock drift between client and fileserver. I've also run into problems where updating a file on a local NTFS partition under Windows would either not update the time or would update it to have an earlier value.

This came up because Andreas Koenig did some extensive testing to highlight random failures in XML::Simple's text suite. Now that I know it's the test itself that's at fault I just need to make it a bit more robust.

Thursday July 26, 2007
03:21 AM

The "Perl end" of XML

In theory XML allows two parties to agree on an unambiguous definition of a format for data exchange. Low level rules define what is and what is not XML. Optional layers on top of that define some sort of schema for the elements in the XML document and once again it is relatively easy to take an XML document and confirm whether or not it complies with the agreed schema.

In practise things are quite different to theory.

In my experience, there's always politics. There's always one party which is either unwilling or unable to comply with the rules or in extreme cases even acknowledge that rules exist. The other party inevitably has to bend over and take it. This has led me to postulate the following 'law':

Where XML data must be exchanged between two parties, the party at the "Perl end" of the pipe will inevitably have to adapt to whatever non-compliant tag-soup gunk the other party emits or expects.

Wednesday July 25, 2007
04:25 AM

God takes a back seat

I live in New Zealand's bible belt. I don't think I was aware of that before we bought the house and don't really think it would have affected our decision even if I had known.

Tawa is a suburb of Wellington - New Zealand's capital city. It's much more 'churchy' than other Wellington suburbs. In fact, there are 8 churches within a two block radius of our house.

When I take the train to work in the morning, by far the most common reading material for people getting on at Tawa stops is the bible. When I was reading Dawkins recently I worried a little that one of my fellow passengers would take exception to my choice of book. I wondered whether an alternative cover might be in order. Fortunately they were too deeply engrossed in their reading to worry about mine.

This week however, the black, leather bound, gilt edged books are less evident. Instead, an entirely different book is very much in evidence. If it's the same people, then their preachers are obviously not sending quite the same message as some of the Southern US counterparts. I only got to start reading it today after son Thomas finished it yesterday. Must get back to it.

Tuesday May 15, 2007
04:47 AM

Adventures with Squid

One of our production servers started behaving as if it were the victim of a DoS attack. I don't think it actually was an attack - that's just how it looked. It may be related to a new AJAX thingy we've just deployed, or a browser bug, or related to problems with one of our routers or maybe even worms/malware on client PCs.

What we were seeing was 90-100 TCP connections from one client machine to port 80 on our server. No request was ever sent over these connections and they stayed connected until the server timed them out after 15 minutes. It happened more than once from more than one source IP. In each case, the access logs showed that a real person had been browsing the site from that IP minutes before. The browser in the earlier session was always IE (6.0 or 7.0). But in the case of these dodgy connections, no request was ever sent so we can't be certain the connections were coming from IE.

Unfortunately our Apache config was fairly naive so lots of connections translated to lots of Apache/mod_perl processes, with almost all the swap space being used, Apache refusing connections due to reaching the MaxClients setting and all the while the load average was barely higher than 0.

Putting aside the cause, it's really not acceptable for our server to be so vulnerable to such a simple 'attack'. A really simple answer would be to turn down the Timeout value. Unfortunately we have some large files that get delivered to remote clients on the far side of the world and 15 minutes really is about as short as we can set it without adversely affecting valid user sessions.

The preferred configuration for a mod_perl server is to have some sort of 'reverse proxy' in front of it to reduce the number of big heavy processes required. In the past, I've used Apache+mod_proxy but in this case that wouldn't help much since a front-end apache process would use a little less memory but would otherwise behave exactly the same way. Ideally, the timeout for the initial request phase should be configured independently of the total request handling time but that's been on the Apache todo list for some years.

I decided to try out Squid in HTTP-Accelerator mode. It has the dual advantages of a lower memory footprint per connection and an independently configurable timeout for the initial request phase. It has the disadvantage of being unfamiliar to me and in my estimation it's not particularly well documented.

Installation was a breeze. It's a Debian server so 'apt-get install squid' was all that I needed. Configuring it was not too bad except for 'acls'. These are very poorly explained in the manual, are used for many different functions (not just access control) and when they aren't right then you just get a 'permission denied' error with no indication of why. A bit of googling eventually turned up this magic config setting:

debug_options ALL,1 33,2

With this in place, the server logs which acls the request matched and why the request was refused. It immediately became obvious that I hadn't designated port 81 (where I'd moved Apache to) as a 'Safe' port. With that fixed, I was able to access Apache but I had to fight some more to stop Squid from caching stuff (including "500 Server Error" responses from my buggy test script).

The next issue was that Squid has its own access log format. It can be configured to use Common Log Format but not the 'combined' format which includes the referrer information needed by our reporting package. It is possible to log referrers separately, but requests get logged in the order they are received unlike the access log which is written in the order request handling completes, which make tying the two logs together rather tricky. We could of course just continue to do the reporting off the Apache logs but as far as Apache is concerned, now all requests come from 127.0.0.1.

More googling turned up mod_rpaf which takes the client details from the X-Forwarded-For header and 'fixes up' the request so Apache sees that as the client for logging purposes. Once again, a simple apt-get was all that was needed to install the module then I pasted a couple of lines into the Apache config and it was all working.

The next hurdle is SSL. The live server has some directories which are handled on an SSL connection with mod_rewrite rules to ensure a smooth transition between secure and not-secure sections. Squid is capable of talking SSL so on the face of it I should be able to get away with only one Apache virtual host in the background. The first hurdle was that the Debian build of Squid does not have SSL enabled. I don't pretend to understand the politics, but the OpenSSL license is deemed to be incompatible with the Squid license so Debian can't distribute a binary package that includes both. Thankfully it's pretty easy to build your own Squid package with SSL enabled:

  1. Download (and unpack) the source with: apt-get source squid
  2. Download and install build dependencies with: sudo apt-get build-dep squid
  3. cd into the source directory and edit the debian/rules file to add '--enable-ssl' to the ./configure command
  4. Use debchange -i to generate a new changelog entry from which the package version will be determined automatically
  5. Build the package(s) with: dpkg-buildpackage -rfakeroot

I deployed the new packages on the staging server and was able to configure Squid to use an SSL certificate - or at least I would have been able to if I had one. Unfortunately, we've never had SSL set up on the staging server but this seemed like the perfect opportunity to fix that.

Obviously there was no question of going to Verisign to buy certs for each of our 4 staging environments. I could have just used self-signed certs but that always ends up with messy dialogs every time you connect. Instead, I signed up with CACert. They allow new users to generate certs with a 6 month expiry but by visiting four of my colleagues (and presenting appropriate credentials) I was able to get fully "assured" and generate a two-year cert - one of the advantages of working in an Open Source shop :-)

With the cert installed, the staging server is now happily running SSL. The bit that's missing is the redirect magic to move between HTTP and HTTPS. Unfortunately, when Squid passes the request on to the Apache server, there doesn't seem to be any indication in the headers that the request came via SSL. It looks like I may be able to use acls (what else!) to get squid to replace a client header with arbitrary data if the request came in on the SSL port. I haven't yet worked out how to add a header. I thought I'd call it quits for the day, blog it and wait for the lazyweb to solve that problem for me.

Thursday May 10, 2007
06:06 AM

New SSHMenu

I finally got to spend a bit of time hacking and pushed out a new release of SSHMenu. The main new feature is integrated support for bcvi which I'm finding extremely handy. A number of people have told me they like the bcvi concept but I suspect I'm the only person on the planet using it right now.

Tuesday May 08, 2007
11:51 PM

Wellington.PM May Meeting

The May meeting of Wellington.PM was last night. Our two speakers were both first-timers - in this forum at least. Dan Jacka talked about Win32::OLE which was good. Unfortunately not many people present use Windows. Maybe if we had more talks like Dan's we'd get more interest from Windows folk but it's the old chicken and egg thing. I know if I was still stuck with windows I'd be keeping Perl very close.

Next up, Paul Chilton talked about his SiteLife project. It's essentially a proxy service through which you can view your web site. The proxied view gives you some widgets you can use to access interesting info and tools. I was particularly interested since I came up with essentially the same idea some years ago. The main difference is that Paul got off his butt and made it happen whereas I just had this cool idea kicking around in my head and some rough sketchs on paper.

Once again, it was mentioned that people love the lightning talk meetings. I do too, but experience has shown that as coordinator it takes me the same amount of effort to get someone to commit to giving a 5 minute talk as it does for a 20 minute talk. So essentially organising a lightning talk meeting means at least 5 times as much work for me :-(

Saturday March 03, 2007
11:31 PM

On JSON

«My bank undoubtedly has a massively complex, strongly-typed, fault-intolerant system handling my financial transactions, but it shows those transactions to me in a web page which throws sixty-three validation errors»

James Bennett in "I can’t believe it’s not XML!"