What am I working on right now? Probably the Sprog project [sourceforge.net].
GnuPG key Fingerprint:
6CA8 2022 5006 70E9 2D66
AE3F 1AF1 A20A 4CC0 0851
One of our production servers started behaving as if it were the victim of a DoS attack. I don't think it actually was an attack - that's just how it looked. It may be related to a new AJAX thingy we've just deployed, or a browser bug, or related to problems with one of our routers or maybe even worms/malware on client PCs.
What we were seeing was 90-100 TCP connections from one client machine to port 80 on our server. No request was ever sent over these connections and they stayed connected until the server timed them out after 15 minutes. It happened more than once from more than one source IP. In each case, the access logs showed that a real person had been browsing the site from that IP minutes before. The browser in the earlier session was always IE (6.0 or 7.0). But in the case of these dodgy connections, no request was ever sent so we can't be certain the connections were coming from IE.
Unfortunately our Apache config was fairly naive so lots of connections translated to lots of Apache/mod_perl processes, with almost all the swap space being used, Apache refusing connections due to reaching the MaxClients setting and all the while the load average was barely higher than 0.
Putting aside the cause, it's really not acceptable for our server to be so vulnerable to such a simple 'attack'. A really simple answer would be to turn down the Timeout value. Unfortunately we have some large files that get delivered to remote clients on the far side of the world and 15 minutes really is about as short as we can set it without adversely affecting valid user sessions.
The preferred configuration for a mod_perl server is to have some sort of 'reverse proxy' in front of it to reduce the number of big heavy processes required. In the past, I've used Apache+mod_proxy but in this case that wouldn't help much since a front-end apache process would use a little less memory but would otherwise behave exactly the same way. Ideally, the timeout for the initial request phase should be configured independently of the total request handling time but that's been on the Apache todo list for some years.
I decided to try out Squid in HTTP-Accelerator mode. It has the dual advantages of a lower memory footprint per connection and an independently configurable timeout for the initial request phase. It has the disadvantage of being unfamiliar to me and in my estimation it's not particularly well documented.
Installation was a breeze. It's a Debian server so 'apt-get install squid' was all that I needed. Configuring it was not too bad except for 'acls'. These are very poorly explained in the manual, are used for many different functions (not just access control) and when they aren't right then you just get a 'permission denied' error with no indication of why. A bit of googling eventually turned up this magic config setting:
debug_options ALL,1 33,2
With this in place, the server logs which acls the request matched and why the request was refused. It immediately became obvious that I hadn't designated port 81 (where I'd moved Apache to) as a 'Safe' port. With that fixed, I was able to access Apache but I had to fight some more to stop Squid from caching stuff (including "500 Server Error" responses from my buggy test script).
The next issue was that Squid has its own access log format. It can be configured to use Common Log Format but not the 'combined' format which includes the referrer information needed by our reporting package. It is possible to log referrers separately, but requests get logged in the order they are received unlike the access log which is written in the order request handling completes, which make tying the two logs together rather tricky. We could of course just continue to do the reporting off the Apache logs but as far as Apache is concerned, now all requests come from 127.0.0.1.
More googling turned up mod_rpaf which takes the client details from the X-Forwarded-For header and 'fixes up' the request so Apache sees that as the client for logging purposes. Once again, a simple apt-get was all that was needed to install the module then I pasted a couple of lines into the Apache config and it was all working.
The next hurdle is SSL. The live server has some directories which are handled on an SSL connection with mod_rewrite rules to ensure a smooth transition between secure and not-secure sections. Squid is capable of talking SSL so on the face of it I should be able to get away with only one Apache virtual host in the background. The first hurdle was that the Debian build of Squid does not have SSL enabled. I don't pretend to understand the politics, but the OpenSSL license is deemed to be incompatible with the Squid license so Debian can't distribute a binary package that includes both. Thankfully it's pretty easy to build your own Squid package with SSL enabled:
I deployed the new packages on the staging server and was able to configure Squid to use an SSL certificate - or at least I would have been able to if I had one. Unfortunately, we've never had SSL set up on the staging server but this seemed like the perfect opportunity to fix that.
Obviously there was no question of going to Verisign to buy certs for each
of our 4 staging environments. I could have just used self-signed certs but
that always ends up with messy dialogs every time you connect. Instead, I
signed up with CACert. They allow new
users to generate certs with a 6 month expiry but by visiting four of my
colleagues (and presenting appropriate credentials) I was able to get fully
"assured" and generate a two-year cert - one of the advantages of working in an
Open Source shop
With the cert installed, the staging server is now happily running SSL. The bit that's missing is the redirect magic to move between HTTP and HTTPS. Unfortunately, when Squid passes the request on to the Apache server, there doesn't seem to be any indication in the headers that the request came via SSL. It looks like I may be able to use acls (what else!) to get squid to replace a client header with arbitrary data if the request came in on the SSL port. I haven't yet worked out how to add a header. I thought I'd call it quits for the day, blog it and wait for the lazyweb to solve that problem for me.