Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Matts (1087)

Matts
  (email not shown publicly)

I work for MessageLabs [messagelabs.com] in Toronto, ON, Canada. I write spam filters, MTA software, high performance network software, string matching algorithms, and other cool stuff mostly in Perl and C.

Journal of Matts (1087)

Wednesday July 26, 2006
06:42 PM

Hacking fun

[ #30438 ]

A few days ago I read James Duncan Davidson's The Web As A Pipe (I think it was linked from a blog entry here). It got me thinking about AxKit 2.0. The question I was asking myself was: why do I need Apache?

All the deployments of AxKit in serious setups probably use a front-end proxy. That's because this is how mod_perl stuff is recommended to be setup. So if the backend is just a system for running bits of perl, why do I need all the extra "apache" stuff?

The other avenue that led me to this (below) is that I've been working a lot on scalable servers lately. I wrote an epoll/kqueue smtp server for our spamtrap (now part of qpsmtpd), and I hack on various logs which I query with dns as I'm parsing them, requiring tens of thousands of parallel DNS queries. For all of this I use Danga::Socket - a sort of POE-like framework for epoll/kqueue, but without all the overhead of POE. I also recently started using djabberd for a jabber server - a very cool project you should check out.

In working on scalable servers (all written in Perl) I got to thinking - "Why isn't there a highly scalable HTTP server written in perl, with decent pluggability like mod_perl, but without all the hassle of setting up apache and mod_perl". So I wrote one.

I borrowed bits of code from danga's perlbal project (a perl http load balancer that they put in front of the livejournal servers, with each perlbal server doing about 40Mb/s of traffic), some of qpsmtpd's ideas, and re-wrote a lot of it as my own.

So now I have a http server. It does basic http stuff, like receive http requests and send responses back. It has pluggable logging (like qpsmtpd). It has an expandable config system, so any plugin can define its own config directives in a MUCH easier manner than apache. It can run CGIs. In benchmarking I can deliver about 1500 req/s (just flat files, and that's unoptimised as I haven't done the fancy AIO stuff yet), but more important is that it's scalable - it can handle 100,000 concurrent connections if you want it to, and easily send large files to slow clients.

Anyway, I'm not going to reveal the source for it just yet (well, some people have seen it - join #axkit-dahut on irc.perl.org if you want to play), but I wanted to blog what I'd been hacking on.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I love the HTTP as pipe idea. It's so much more sensible than FastCGI, which I now hate, having used it for rails.

    Everybody in the rails world seems to be looking at Mongrel [rubyforge.org] recently. One of it's claims to fame is that it's got an extremely fast HTTP parser (written in C), which helps it to scale well. I wonder if it would be worth appropriating that code for your project...

    -Dom

    • The bottlenecks are elsewhere.

      if I can serve 2000 reqs/sec (localhost to localhost, using apachebench) of just static files, and this server is for applications, I'm not going to worry about parsing headers too much.
    • What did you hate about FastCGI? It seems to be getting more popular at the moment.
      • The complete lack of introspection. It's a binary protocol and I found it quite difficult to debug without resorting to tools like strace. Whereas with HTTP it's laughably easy to see what's going on.

        -Dom

  • You're such a tease, but I love the fact you're hacking on this ;-)
  • Won't you end up with something pretty similar to lighttpd + FastCGI?
    • Not really.

      Say you want to build an app - you don't want to fuss about deployment just yet. With this you can just download it, and run "./axkit" and start building your app. No downloading extra httpds and configuring them.

      When you want to deploy, you can either deploy standalone like this without an extra httpd, or you can stick lighttpd or apache up front, giving you whatever extra features you might need from those (e.g. SSL).

      All this is easier to debug than FastCGI, and can utilise proxy caching at the
      • I have read the original article and I think he's full of it. FastCGI's protocol is too hard to understand, so let's write an HTTP server? It doesn't make any sense. Writing a good network server is much harder than figuring out how to debug FastCGI hiccups.

        It sounds to me like you're solving a different problem -- how to have a quick dev server. Most projects are doing this with HTTP::Server::Simple. I personally think it's a bad idea to develop on a server that isn't identical to what you deploy on,
        • It's not just about hiccups with FastCGI. I don't know about all the ins and outs of that protocol, but can you make the frontend cache the results with FastCGI if you send the right http headers? If not then that's a huge reason right there.

          I also wanted a server that can scale decently if you need it to. I don't believe HTTP::Server::Simple (or anything currently on CPAN, with the possible exception of Perlbal but that doesn't do dynamic content) is scalable.
          • The caching issue is probably dependent on the FastCGI implementation. With apache 2 and the new caching stuff, it should be possible. Not sure about lighttpd.

            HTTP::Server::Simple doesn't scale at all. It's just a quick dev server.

            The main difficulty in using a single-threaded server for dynamic content generation is how you handle slow things that are hard to split up, like database queries. I talked to a couple of people at OSCON (Stas Bekman, Artur Bergman) about this and they were both following the
            • It seemed very similar to a lighttpd/FastCGI model ultimately.

              Sort of - except there's a reason smart people are going down this single-threaded server route - even if you have some bits hanging off as other daemons, ultimately your scalability is still better. And with all the AJAX stuff going on now, high parallelism is becoming even more of a big deal.
        • Why is it appreciably harder to write a good dæmon vs. a FastCGI frontend? That doesn’t make any sense to me.

          • Have you ever tried to write a reliable HTTP server that deals well with all the things that can happen with networks and broken clients, and has very high performance? There's a reason why apache httpd took more than a few hours to write. Now compare that to simply adding a little debug code to an already working FastCGI implementation.
            • No, but I've written a couple of SMTP servers that do that :-)
            • So what you’re objecting to is not any of the stated goals, but just the fact that it would require effort that cannot build on something existing?

              • The stated goal that I saw in that article was "fix my (unspecified) FastCGI problems" and the solution was to write an HTTP server. It didn't make sense to me, and it still doesn't. Matt's goals (quick bundled dev server with decent production capabilities) don't seem related to what the article was talking about.

                In the article, Davidson complains about "zombies, mysterious crashes, and other annoyances" -- sort of like the kind you see when you try to write a new networkd server. He complains that you
                • You haven't answered the point about frontend caching when sending the right headers. That can be a big win. Honestly the FastCGI protocol seems to be a solution in search of a problem - the HTTP protocol is perfectly suited to whatever the inventors were trying to achieve.
                  • Apache's mod_cache can cache any URL, so I assume FastCGI would be fine. I don't think lighttpd has caching built in, or proxying for that matter.

                    FastCGI doesn't seem very useful to me either, since I already have mod_perl (and many other perl options for that matter), but some of the people who I talked to at OSCON seemed to prefer FastCGI for its simplicity when compared to running another separate httpd daemon.

                    The bottom line for me is that it seems like a strange choice for Ruby. They don't have a mod
                • I don’t see how database server binary protocols are in any way a relevant example. They are closed proprietary wire formats, they have a single implementation for each end of the wire, and the code for both ends is written by the same entity. Noone expects the client library of MySQL to interoperate with the PostgreSQL server, yet people fully expect to be able to run the same FastCGI-fronted application under lighttpd, multiple different FastCGI implementations for Apache, and who knows what else.