For the last few months, I've been fading away a bit in the Perl/CPAN communities.
Certainly I haven't committed much of anything of note to my svn.ali.as repository in a while other than a doing a couple of releases for work mostly done by other people.
Instead, I've been participating in the orgy of hacking going on around Australian Government's new open data kick.
The main thrust of my work has been at the intersection of mapping and the structure of government, two subjects which I knew little about before the last few months.
With dozens of new mashup sites and many more experimental hacks appearing over the last 3 months, I've found that there is a clear capability gap in the govhacker community.
A number of good commercial tools exist for geo work, which the professionals use. A number of good free tools (Google Maps and friend) also exist for storing and displaying maps.
However, if you remove the desktop tools (which are rich but can't be merged into larger applications) and the Google Maps-like APIs (which are limited but can be merged into larger applications) you find a complete lack of anything high level which is both free to use AND can be merged in as a component of a larger system.
The result is that most govhackers are like Perl hackers without a CPAN, they are scrabbling around in the dirt doing tons of reimplementation and (usually bad) site-specific reimplementation of things that should really be standard tools.
Being someone that is mainly a toolsmith by nature, I've set out to fix what I see as the most pressing problem.
By mixing a ton of information from the census, electoral commission and various other databases, I (with my former business partner Jeffery Candiloro) have built http://geo2gov.com.au/.
This is a web service which takes a location description in a wide range of formats, and then identifies which federal, state and local governments (and which parts of those governments) are related to that location. It's built using Catalyst and the geo-magic is done using PostGIS with DBIx::Class sitting between the two.
To bridge the speed gap between traditional GIS (expensive but acceptable fast for a single user desktop application) and the interweb (cheap per-transaction costs, scalable infinitely) I've built a database schema which is VERY non-traditional, then indexed, clustered and analyzed the hell out of it. It's tuned very specifically for the read-only lookup task, and not intended for general storage and manipulation of the data.
As a result, I can identify the location of a lat/long in government terms at every level in less than half a wall-clock second (worst case), for anywhere GPS point or street address in Australia, including the various weird offshore islands and various territories.
A richer call which also maps the 15 Census codes used to identify a location runs in around 1.5 wall-clock seconds.
Thanks to the wonderfully complete DateTime library, I shortly also plan to implement all of Australia's 8 timezones in mapping terms (including the silly ones like the single town in Western Australia that decided it wanted a timezone all of it's own).
However, as I get 100% coverage over different concepts, I find the scope is gradually creeping outwards and this thing wants to become "The database that ate the Universe".
This wasn't a problem with CPANDB, because there's really only so much data you need to care about. But countries have FAR more data than the CPAN does. The SVN checkout for the entire geo2gov repository (which contains both all the code and cached copies of all our input data) now measures in at around 3-4gig and I have another 10gig of data that might potentially be useful to add.
The current dev version of the "product" schema (the compiled form that powers the actual service) is in the vicinity of half a gig. It contains every legal jurisdiction, every house of parliment, all the electorate divisions, all the members of parliment and the nested key structure for the entire census (plus mapping layers for all of the above).
I've also merged in a complete database of polling places (for "Where to Vote" type uses) and I'm seriously debating including an entire copy of the census at the lowest published level (just because I can).
That said, I'm reaching some of the same limits I did on CPANDB. For one, the main import process is now starting to consume more than the maximum available system memory on my machine. So many of the same tricks I had to do with Process.pm are starting to become needed for geo2gov as well.
P.S. Now that I think about it, "The web service that ate Australia" would make a great title for a talk...