Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Alias (5735)

Alias
  (email not shown publicly)
http://ali.as/

Journal of Alias (5735)

Thursday May 20, 2010
12:57 PM

Quote of the Week

With apologies for any paraphrasing from my flawed memory.

"My god, you guys actually get it. I need to take some Perl people to every meeting I have, because nobody else seems to understand the importance of designing for back-compatibility properly."

-- Rob Mensching (The creator of Windows Installer XML and the WiX toolkit, the first open source project ever at Microsoft).

Monday May 03, 2010
09:08 PM

Frugal Innovation - The Economist discovers ::Tiny

In the April 15th edition of The Economist, they provide a special report on innovation in emerging markets.

http://www.economist.com/specialreports/displaystory.cfm?story_id=15879359

Half of the report focuses on what is essentially a restatement of the principles behind ::Tiny, but for real world products rather than software.

They term this "Frugal Innovation", and it involves taking types of devices consumed by the rich world, and then reinventing the idea behind them to make entirely new and novel devices (NOT just simple copies) that achieve the same or most of the same effect but for much less.

They trumpet the same magic formula that ::Tiny uses, which is to provide a similar but effective function at a 90% reduction in cost, compared to existing products that have evolved via "Fat Innovation" (the typical rich world innovation which results in having new models every year, with more features and more complexity).

Frugal Innovation doesn't just involve reducing the input and overhead costs, but sometimes reinventing the process used to create the software. And ruthlessly stripping out anything that isn't universally necessary.

Some of the same principles are now also being applied to services, with one Indian medical centre setting up essentially a production line for heart operation, applying the same principles of specialisation and copious less skilled support services that created Ford's original production lines for products.

There's a great podcast from The Economist on the same topic here...

http://feedproxy.google.com/~r/economist/audio_all/~3/e078AnUezYE/20100416_sr_in novations_4EQO.mp3

One interesting side note is that for software this same principle appears to feed into the measure of long term evolution I've mentioned in a couple of my talks, which is "Information processed, per unit space, per time".

This might be something that's actually interesting to measure in it's own right. You could measure this "Software Density" score by taking the amount of work some code does, and then look at how much code is never (or rarely) run.

This would be something similar to coverage testing, but for regular operation.

Thursday April 29, 2010
08:16 PM

I will be in the US from the 12th to the 23rd of May

On the 13th and 14th of May I will be in Redmond, Washington at the 2 day kick off workshop for Microsoft-funded "Common Opensource Application Publishing Platform".

CoApp is an attempt to replicate something similar to Debian's dpkg/apt within the native Microsoft MSI installer, a single standard dependency and packaging system for install the large and complex dependency trees that exist in Open Source software.

While I don't have any official role in the project, I'll be providing advice wearing a mix of Perl toolchain, CPAN, Strawberry, OpenJSAN and CCAN hats.

I'm also hoping to firm up the fairly obvious idea of building a binary package repository for Strawberry Perl similar to ActiveState's PPM repository, except one which would be based on native MSI packages and would support the concept of cross-language dependencies into C library space (reliable installation of shared common libfoo.dll libraries are the initial primary target of the CoApp project).

Strawberry has never had this binary package repository primarily because it hasn't made a lot of sense to put in the enormous programming and computational effort when we can't actually provide anything better than ActiveState's system (it's better that we just leverage off it).

CoApp represents a new model for binary packages that has the potential to be significantly better than PPM, and so it's a much more interesting option to explore.

I'll also be in Framingham (near Boston) from the 17th to the 22nd of May at the Staples HQ (who are buying up the company I work at) doing some exploratory meetings to meet my counterparts in the US and look for interesting technology to steal and bring back to the Australian office.

Between and around these dates I should have some day trip and evening catchup opportunities with Perl folks. I'd also be quite happy to do any talks for local Perl monger groups if we can organise transport and a projector somewhere.

This trip is at short notice and my schedule is still firming up, but if you are interesting in doing something, mail me on my adamk@cpan address and let me know.

Wednesday April 14, 2010
01:48 PM

Mojo vs Dancer Week 2 - Templates and Images

Welcome back.

My apologies for the delay. I blame trying to keep up with the QA hackathon from half a world away, and another unavoidable event from which I could not escape.

I've also delayed another day because I'm a bit concerned that my review this week would paint Mojo in an unfair light, and I wanted to sleep on it.

As I get deeper into both, I'm finding that many things I don't like about one turns out to have a similar behaviour in the other. As a result, whichever one I review first would be the one that a stream of consciousness commentary will paint in a worse light.

And I've wanted to start each week with Mojo first, as the more established competitor. But I think this might be unfair given the emerging similarities.

So each week I will reverse the order I review them in, and this week I shall attempt to emulate finding my annoyances in Dancer first :)

This week, I had originally planned to look at configuration and database.

Like in many projects though, this turned out to be way too trivial because I'm basing the website on the CPANDB module, which is zero-configuration.

So in BOTH applications, I only had to add the following and I have my database layer finished :)

use CPANDB {
  # Don't check for updates more than once a day
  maxage => 86400,
};

Instead, I'm going to look at the second phase for most website projects as a newbie. Bootstrapping from helloworld.pl into an equivalent "proper" website with templates, images and CSS, but no actual content.

Dancer - Bootstrapping a website

Since my review last week, a couple of new releases of Dancer hit the CPAN claiming to fix installation on Win32. Just to prove it, I've done this week's testing on my conference presentation laptop instead of my desktop machine.

Dancer installed first time cleanly on the new machine. And the hello world script from last week runs correctly. So all good there.

After a more-complete-than-last-week read through the main Dancer search.cpan.org page one thing jumps out quite sharply about the Dancer API in general. And this is that it isn't object-oriented, which is something or a rare thing these days.

Or at least, it doesn't LOOK object-oriented. Judging from the distribution page there's plenty of classes and I'm sure the internals are all done largely in an OO fashion.

The main API that you get with "use Dancer" sports a similar kind of "Do What I Mean" command interface that reminds me a bit of the Module::Install command interface.

This means that the code to show the index template is going to look like this.

get '/' => sub {
    template 'index';
};

I'm not entirely sure what I think about this idea, despite the fact that I'm the maintainer of M:I and created it's Domain Specific Language inteface. This kind of thing usually raises red flags.

I can see this API strategy either descending into API chaos or becoming one of Dancer's best and most loved features. Or quite possibly both depending on the scale of the project.

For the moment, given Dancer is meant to be taking a micro-framework approach (which should be optimised more for small websites) I'm willing to suspend my disbelief and run with it until I can make a better judgement.

The documentation in general is oddly structured. For a command-oriented API like this I would expect to find documentation for each of the available commands. This section of the documentation does exist, but it doesn't contain a list of all the commands.

Instead, some of the commands are described down in sub-headings related to a logical area instead. And if you look at the table of contents on the search.cpan page the logical grouping doesn't appear very, well, logical.

There's some other indications the documentation has built up in bits and pieces rather than being addressed in a complete fashion. Some features are glossed over quickly, leaving me a bit stumped still. Other go into way more detail than is needed for a small website to the point of information overload.

The section on config files seems to suggest I should make three config files, a global config and then a pair of development/production config files that overlay the global. And these files go in different directories for some reason, unless I "put it all at the top of the program" (without saying how I embed this YAML in the program). A bit later on I realised they meant embed via the "set" command.

Overall, I think the documentation is reasonably thorough but just needs some love to clean it up into a more learning-friendly structure, with some sections shrunk down in the main Dancer page to just the most basic and common uses and references out to other pages for using more advanced features.

Stuck on something I wasn't sure about, I resorted to running the built in skeleton generator (mentioned prominently on the Dancer::Cookbook page) to at least get a better idea of what I was supposed to be doing.

C:\Users\Adam>dancer -a top100
+ top100
+ top100\views
+ top100\views\index.tt
+ top100\views\layouts
+ top100\views\layouts\main.tt
+ top100\environments
+ top100\environments\development.yml
+ top100\environments\production.yml
+ top100\config.yml
+ top100\app.psgi
+ top100\public
+ top100\public\css
+ top100\public\css\style.css
+ top100\public\css\error.css
+ top100\public\images
+ top100\public\404.html
+ top100\public\dispatch.fcgi
+ top100\public\dispatch.cgi
+ top100\public\500.html
+ top100\top100.pm
+ top100\top100.pl

For a few seconds this shocked me, because compared to the simplicity of helloworld.pl this is a lot more files.

At this point I hadn't even remotely considered the idea of site-customising my error pages, and there's 5 entirely different types of Perl application files in this list. I (newbie me) don't even know what PSGI is, let alone why I need one. And don't get why I have a dispatch.cgi in addition to my top100.pl script.

Frankly, I don't even really WANT to know what all these things are (yet?). But this did at least confirm where all the files should be living and the demo app did actually start and run properly. So in my case, it solved the problem I needed to solve.

But I certainly don't want to use it as the basis of my Top 100 site. It's just a bit overwhelming for my level of experience, and I don't want to have to go exploring and work out what all these different files do, so I know which ones are safe to delete.

I'm not ready yet, I have an index page to get showing first.

Templating was largely straight forward, mostly because the docs do everything to steer me towards Template Toolkit short of installing it for me (yes, you have to install Template Toolkit yourself to get the generated skeleton website working).

I suspect the reason for this is because their embedded default templates are not PARTICULARLY featured. The documentation sums up the feature set of the built in templates as the following.

<% var1 %>

No other features are described, making Template Toolkit the obvious choice. :)

I'm totally fine with this, but what I'm not entirely sure about is why on earth they have chosen to make Dancer's default tag style for Template Toolkit different to Template Toolkit's default tag style for Template Toolkit.

The only reason I could uncover from the documentation is that it makes TT compatible with their built in template engine. The one whose list of features is "variables".

The only other guess other than "we like it more that way" I can make are that it's done to be compatible with Sinatra, the Ruby toolkit I've never heard of but which I'm told was the inspiration for Dancer.

Switching it back to regular Template Toolkit tag style requires a slightly annoying block of config file entries I can easily imagine myself repeating in every Dancer program I will ever write.

To help you in gauging my annoyance level on this topic, allow me to show you now the complete working Dancer application at the end of this week.

#!/usr/bin/perl
 
use Dancer;
use Template;
use CPANDB {
    maxage => 3600 * 24 * 7,
};
 
# Configuration block
set template => 'template_toolkit';
set engines  => {
    template_toolkit => {
        start_tag => '[%',
        stop_tag  => '%]',
    },
};
 
# Route block
get '/' => sub {
    template 'index';
};
 
dance;

It's not a problem as such, but I'm sure it's one of those things that's going to niggle at me. If I wasn't trying to emulate a newbie, I'd probably switch over to the Template::Tiny plugin in the short term. Unfortunately, it's not particularly well known and so is probably out of bounds for this competition.

Given their strong preference for Template Toolkit, in their situation I'd probably have inlined the entire Template::Tiny package as the "house template" engine. Then you still have a direct upgrade path when people hit a feature T:Tiny didn't support but you get to keep the default tag style. But the current predates Template::Tiny, and so be it.

I also had a momentary confusion over whether I should call my template "index" or "index.tt" or "index.html" or "index.html.tt", but I put that mostly down to playing with Mojo's double-dotted templates (which I'll get to later). A quick look at the generated skeleton made it pretty clear what the naming is.

Adding the static files was very simple and easy to understand, following the convention of "if there's a "public" directory, everything in it is a static file".

However, I did hit a problem when I tried to add my logo image.

Weirdly, although my Dancer application was happily sending me css and my favicon.ico file, it didn't appear to support PNG images. Bemused, I placed my first (legitimate) visit to the #dancer channel to ask a What The Hell.

Dancer uses MIME::Types under the covers, and the bug only happens on Windows, and they confirmed they can replicate it. Beyond that there's no more information yet, but I've worked around the problem temporarily on my Dancer application by converting the PNG files to GIF.

And given the speed with which they chased down the previous Windows problems, I'm hopefully that by next week the problem will have gone away.

The final issue I forsee in any future life with Dancer is the way in which the templates and the static files combine together.

Although I'm capable of writing HTML by hand, I don't like to. I've been a constant Dreamweaver user version 1.0, because you gain all the benefits of seeing your page design as you work on it but without any risk of damaging templates.

(Historical Note: Dreamweaver's "Round Trip HTML" parsing ability was the primary source of inspiration for PPI's "Round Trip Perl" parsing)

By having all the static files in /public and all the templates in /views but serving everything from root, you can't really open a template in a GUI HTML editor and have the static files be consumed by the templates.

This problem is compounded by Dancer's "layout" feature, which is sort of like an inside-out INCLUDE block. I can see the attraction for things like "skinning" websites or having lite/print rendering, but it would also make it even harder to edit content using anything other than raw HTML.

I know a lot of Perl people don't like the GUI crowd, but the newbie web crowd is really going to live and die by GUI tools (which explains the popularity of PHP really).

Overall, the process was relatively straight forward with the exception of the PNG problem and a general feel that the documentation/tools dump you in the deep end a bit too quickly and need polishing.

Mojo - Bootstrapping a website

Having looked a little deeper into both Mojo and Dancer, one thing that strikes me is the level of similarity between the two. I'm assuming that there's a fair bit of feature and convention theft going on between the two, and thus presumably Ruby Sinatra as well.

In particular, Mojo also uses a separate /public and /templates directories and the documentation prominently mentions layout support. So the problem of templates that can't be edited in a WYSIWYG fashion also applies to Mojo.

In fact, having seen both doing this I can't help but wonder if this is part of the cause of the simplicity and/or stagnation I see on both websites.

When all site modifications have to be done in the raw, it makes it harder for casual contributors to provide ad-hoc content or look-and-feel improvements to the site.

The main Mojolicious::Lite isn't bad but it's rather short on written commentary and chock full of examples, with no separation into topics of sub-heading.

It comes across as a sort of giant SYNOPSIS, and I would have appreciated a bit more information beyond example code.

The curious file naming of templates in Mojo comes in for specific attention here. Template files use an interesting double-extension mechanism. Templates are referred to by their base name in code, and then the extensions indicate content type and template format respectively.

The most awesome thing that emerges from this is the idea that Mojo can use the same identical code to support multiple different content type requests. From the Mojo mega-SYNOPSIS docs.

# /detection.html
# /detection.txt
get '/detection' => sub {
    my $self = shift;
    $self->render('detected');
};
 
__DATA__
 
@@ detected.html.ep
<!doctype html><html>
    <head><title>Detected!</title></head>
    <body>HTML was detected.</body>
</html>
 
@@ detected.txt.ep
TXT was detected.

The flexible option of either embedded or standalone template files is also very cool for allowing complete websites in live in a single file.

As a bonus challenge to push out this potentially legendary flexibility, I decided to try and get Mojo installed and working on my Google Nexus One phone. The Android Scripting Environment comes with a trivially easy ability to add Perl support (5.10.1 no less), and all I had to do was drop the lib directory from the Mojo distribution onto the phone via USB, along with the Mojo helloworld.pl file.

Unfortunately, it failed the test although arguably not through any fault of it's own. The Mojo deep internals make use of the core module Encode.pm and unfortunately Android Perl does NOT come with Encode.pm in it, and making the Encode dependency optional isn't really an option for them due to the depth at which it is needed.

But back to templates.

And here's where I get negative about Mojo. While the in-house templating in Dancer is an afterthought they sweep under the rug in preference to Template Toolkit (which is almost, oddly, a positive) Mojo backs it's in-house templating 100% so it stays zero-dependency.

And the Mojo template style is really really messy. It's essentially an "embedded Perl" tag-flipping system which sees your templates HTML mixed deeply with Perl fragments.

Template features are provided via a fairly powerful Perl API in those fragments, which at least makes the best of things.

Now while I'm absolutely against mixing code and content, I do recognise that this approach might be useful in small contexts (it tends to be something that fails mostly at scale).

But what really makes me cringe is this.

<% Inline Perl %>
<%= Perl expression, replaced with result %>
<%== Perl expression, replaced with XML escaped result %>
<%# Comment, useful for debugging %>
% Perl line
%= Perl expression line, replaced with result
%== Perl expression line, replaced with XML escaped result
%# Comment line, useful for debugging

That's right, there are EIGHT different tag types. Four different ordinary tags alone would push some complexity budgets, but the real crime is the idea of duplicating those four into another four that are enhanced by significant whitespace!

These are further complicated by an optional closing modifier as seen in the following.

<%= All whitespace characters around this expression will be trimmed =%>

This introduces another disturbing idea, that of symmetrical escaping where each side means something different to the other.

And it gets even scarier once you add block capturing.

<%{ my $block = %>
    <% my $name = shift; =%>
    Hello <%= $name %>.
<%}%>
<%= $block->('Sebastian') %>
<%= $block->('Sara') %>
 
%{ my $block =
% my $name = shift;
Hello <%= $name %>.
%}
%= $block->('Baerbel')
%= $block->('Wolfgang')

And for good measure, it seems to "enhance" HTML with Unix-style line continuations!

This is <%= 23 * 3 %> a\
single line

The sum total of all these features is to be utterly hostile to editing templates in anything other than raw text, and I have genuine fears that if I unleashed a mental list of people on these templates they would very rapidly decent into sigil soup and line noise.

If I do go with Mojo at the end of this process, it will absolutely be with the provision that under no circumstances will I allow myself to look at the full template documentation, to prevent tempting myself into using some of the features described above.

And the really sad part is I'm really not sure I actually NEED any of the features. I'm fairly certain most of the complexity could be dropped without major ill effects.

The other sad thing about all this is that once you get past the horrendous tag style, IF you can get past it, the APIs provided to the Perl code chunks have some fairly neat conveniences in them.

Each route/handler allows a name to be associated with it, which identifies the route for things like the url_for tag functions, but which also serves as a template name for the handler if that is all that is provided.

This gives Mojo far far more concise code compared to Dancer for the simple site so far. Here is the following complete Mojo code for the boostrap site.

#!/usr/bin/perl
 
use Mojolicious::Lite;
use CPANDB {
    maxage => 3600 * 24 * 7,
};
 
get '/' => 'index';
 
shagadelic;

Indeed, at this scale it actually IS kind of "shagadelic". But of course I don't expect the meme to hold once I've added several hundred lines of actual website functionality.

Week 2 Results

Best Skeleton Generator - Mojo

Mojo's skeleton generator offers two option, heavy and lite. This, plus the fact I found Dancer's skeleton generator to be overwhelming, give it to Mojo.

Notably though, I didn't actually end up using either of them. In both cases, it started introducing concepts I didn't really want to have to deal with so early.

Best Application Layout - No Winner

Not only are the file layouts for both frameworks pretty much identical, but both basically prevent you using GUI HTML editors. Which really sucks, so I'm not even awarding a draw. Just a FAIL.

Best Templating - Not Mojo

Even though their internal templates appear to be so thin as to be non-existant, all their documentation quickly pushes you towards Template Toolkit. As the most popular templating system (by CPAN dependency count at least) this should be familiar territory for many many people.

But I really don't want to award it to them for positive actions. This one is decided on comparative negatives.

The annoyances/gotchas of having to add the magical "act like normal TT" invocation and not making Template a proper dep when even their own site generator uses it, simply don't compare to the utter zanyness and Dreamweaver-hostility of Mojo's tag format.

So I'm awarding this one AGAINST Mojo, instead of to Dancer. Both had their own respective dissapointments.

Overall Leader after Week 2 - Mojo

Why? The thing is, even though I've been forced to read about it, I haven't actually had to really do anything in my templates yet. So there's no tangible real pain from the Mojo templating system yet. What code is there is still really really tight.

So with a balanced score this week of zero vs zero, and Dancer's dishonourable mention for the PNG bug (if only because nobody ever noticed it), Mojo remains in the lead.

Next week, I try to set up the front page to actually work and try to get a generated Top 100 list added to the site via templating.

Sunday April 11, 2010
11:26 AM

Competition and CPAN Errata

A few housekeeping issues.

1. I may be a day behind on this weekend's competition update. I've completed the Mojo half, but not the Dancer half. I'd rather post the results at the same time, so I'm delaying by one day.

2. After an email from Andreas to all CPAN authors to clean up our author directories, I've deleted about 1000 files. Unfortunately, that accidentally included all production versions of Class::Adapter, breaking Padre and a number of other things. A new 1.07 has been uploaded and this situation should be resolved shortly.

Thursday April 08, 2010
02:58 AM

I don't want to be forced into your damned warning policy

Update: I am informed that while I may imply a timeline which sees the pragma modules mentioned below taking action before Moose, time-wise Moose acted first and then the pragma modules came later

Last weekend in my first round comparison between Mojo and Dancer, I noted that neither project used strict or warnings.

At the time, I suspected this was done for clarity. After all, it can get annoying when use strict and use warnings to get in the way of having a nice a clean synopsis.

It was too my great surprise that I discovered both web frameworks had decided that use strict and use warnings were good enough for everybody and they would silently turn both of them on.

This makes them the third group of modules to decide how I should write my code.

First are your $Adjective::Perl style modules.

This I can live with and playing with pragmas seems quite reasonable, since it's a style pragma module itself. By saying "use perl5i" I'm explicitly buying into their view of what code should be written and formatted like.

Then Moose decided that they would turn on strict and warnings as well.

This makes me a bit uncomfortable, since I use Moose for it's object model. I don't really want it imposing it's views on how I should write the rest of my code.

I can hear you already saying "But wait! You can turn it off with no warnings, but you shouldn't do that because it's best practice (and Best Practice) to always have them both on, and anyway it's only enabled until you say "no Moose;"

Or is it? That alone is an interesting question.

Do Moose's views on strictness and warnings able to escape it's scope, and will be imposed on me even when I tell it to go away with no Moose;

Or if they do go away, does that mean I've accidentally been running a whole bunch of code without strict and warnings on by mistake?

But I digress, now where was I... oh right!

<rant>

I appreciate you are trying to be nice and save me two lines, but dammit I'm not paying you (metaphorically) for that, and now I have to THINK instead because the LACK of an option to your code can be meaningful. It's worse than meaningful whitespace, it's a meaningful unknown. And I can trivially automate the production of those "use strict;" or "use strict;\nuse warnings;" (as you prefer) lines in pretty much any hackable editor written in Perl. Automating the thinking you have to do when there ISN'T something in the code is much harder, or impossible.

This kind of thing with Exporter is one of the (four) provably impossible problems that prevent Perl being parsable. Gee thanks!

From a perception point of view it's the same kind of situation when a Media Company announces they are going to buy a Mining Company. Why? Because the Mining Company has a lot of cash but little revenue, and the Media Company has a lot of revenue but little cash, so they'd "Go well together".

Before I say any more, you should already be a bit suspicious. And it's probably no surprise when you find out that the part-owner boss of the Media Company is also a part-owner of the Mining Company.

But that kind of thing is an obvious form of Conflict Of Interest. Humans are almost universally tuned to spot that kind of thing and see it as a negative.

It's a much trickier situation when the conflict is between Doing Your Job and things like Trying To Be Nice, or things like Clearly You Probably Meant It, So I'll Just Silently Correct That For You. There's a variety of meme's in this situation, different mixed perceptions based on your own personal morality.

But that doesn't remove the technical issue that you've conflated two entirely different functions into one module.

So now with Moose if I don't want warnings, but I do want strict, I'm not sure if I need to do this...

use Moose;
no warnings;
 
...
 
no Moose;

or this...

use Moose;
 
# Lets say I'm nice and allow all my Moose definition code
# to follow their warnings policy, because I like their rigourous approach.
 
no Moose;
use strict;
use warnings;

Neither of these things are particularly pretty, but I'm stuck with the situation because there's a conflict of interest. The Moose authors choose to impose their views in an area outside of their scope, because it's convenient for them and saves them a couple of lines, And Besides Everyone Should Do It That Way.

But as long as it's only pragma modules that change pragmas (plus Moose) it's just an idiosyncracy, one of those weird little things modules do sometimes.

Except now we have another problem, because now It's A Trend. Everyone famous is doing it. Clearly it's The Right Thing.

So now we see web frameworks doing it. Mojo does it. Dancer does it.

Clearly it's the right thing to do, because chromatic and Schwern and Damian and the Moose cabal are doing it so it must be awesome.

At this point, you're probably preparing a snarky comment about how I'm just curmudgeonly and nit-picking. How I should only turn off warnings when I know a warnings is going to happen inside a SCOPE: { no warnings; ... } and how we need to set a good example for the less experienced people, and how YOU always want to see the warnings in production, and how warnings create bug reports so you fix more bugs, and so on.

But what about practical issues? What about situations where you need to do big things, complicated things, large scale things in one or many of the dimensions of width, or throughput, or reliability or complexity or code size.

Over the last 10 years, in which 90% of my paid work has been on websites, I can recall three situations in which I found truly important warnings on a production web server that I didn't find in any testing, and that would have led me to a fix I otherwise would have overlooked.

I've found tons of exceptions on production sure, but not that many warnings that mattered.

However, in that same 10 years, I've seen the opposite situation 6-8 times.

I've seen sysadmins blank out a config variable the wrong way, resulting in an undef where there should have been a value, which is checked in an "eq" comparison 20 times per page, each of which produced 2-3k of log file.

Or worse, I've seen this in a foreach ( ... ) { if ( undef eq 'string' ) { ... } } which is operating on several hundred or thousand entries.

Half the time, this happened because someone in the same office at the same time you are at work touched something they should have and uncovered it.

And when you see the load graph on the box spike the next day, you investigate and find it compressing 20gig of log files, all of which contain the same identical warning printed 40 or 50 million times.

But if you aren't so lucky, it happens on a weekend, or at night, or you haven't set something up right in Nagios, and now you've filled your machine's entire /var partition over the weekend, which prevented 2 or 3 other services that need /var too from working, which brought down the service, or the server.

I've seen 10 machine clusters running at high volume overflow every log partition in the cluster at a rate of a gig per host per minute because a telco outage at night caused a minor backend service to fail, which returned a single status string that wasn't checked for defined'ness to go undef and that status hash value was checked in the hot loop.

I've seen horrible UDP network syslog storms, and boxes dead so fast the Nagios Poll -> Human Alert -> Getting Online lag of 15 minutes wasn't enough to catch it and prevent it.

All of it because in a codebase of 50,000 or 100,000 lines, you only have to miss ONE thing in the wrong place to produce a warning. And nobody's perfect.

Now, by all means I encourage development with warnings on. And I absolutely think warnings should always be on in your test suite, with Test::NoWarnings enabled wherever possible for good measure, and a BEGIN { $^X = 1 } in the test header to really make sure warnings are aggressive.

If there's a configuration option for it, I'll even leave them on through User Acceptance Testing and Fuzz Testing and Penetration Testing and Load Testing and anything else that isn't Production.

In production I don't want to know about mistakes.

Well, that's not entirely true...

I want to know about mistakes in production, but what I want more is that production absolutely positively NEVER goes down. There's no debugging convenience in the world that should result in even the slight risk of turning into a Denial of Service.

The Spice MUST Flow.

If I go down in production, I want it to be for a reason that has never happened before. Ideally involve a three or four factor failure.

If a plane crashes into a telco NOC, triggering a complete network outage on my side of the city, and we switch to the disaster recovery site but the power ripples from the plane crash caused a transformer to blow, and the generator fails after 20 minutes because of a critical heat event due to a bird nest in the radiator catching on fire, because the maintenance man was on paternity 2 month paternity leave and the stand-in techie doing his job wasn't legally qualified to be on the roof with the electrical gear, THAT I can live with.

If an East European mafia takes a shining to me as a blackmail target, and initiates a 50gig/sec botnet distribution denial of service attack and we haven't set up the DDOS-protection contract because the financial crisis caused our budget to be cut this year, well that I can live with too.

If I can't afford any of that fancy stuff, but the Facebook utility my shoe-string budget startup created turns out to be too popular and despite my best efforts to keep it blazingly fast to prevent this kind of thing the success overloads my server someone just dropped the default Ubuntu on because we needed it up quickly, hey I'm happy to have that kind of problem.

Compared to these kinds of reasons for going down, having a volunteer website administrator who lives in Europe fiddle a setting they shouldn't have while I was asleep, and having the 500gig hard disk overflow with the same identical message repeated over and over 5 billion times really doesn't cut it.

And this same kind of thing seems to happen over and over again about once every 14 months, and only half the time am I lucky enough to do something about it in time.

When my object model is forcing warnings on me, and my web framework is forcing warnings on me, what am I supposed to do?

package Foo;
 
use MyWebFramework;
no warnings;
 
use Moose;
no warnings;
 
use Some::Random::Module;
no warnings; # Can I really be sure they don't enable warnings?
...

Am I supposed to repeat this in every single class?

What about when I want warnings ON, now what?

Unlike exceptions, it's way way harder to catch and manage warnings, to force them always on and force them always off when you are in different environments.

The only way I know of to reliable distinguish between maximum noise and diagnostics and explosions in dev/test/uat, and no noise at all in production (except properly managed exceptions) is to have the code NOT use warnings, and then force it on from the top down in the right environments.

We've seen similar things before, stuff that starts out simple and obvious but just causes pain.

The @EXPORT array was, I'm sure, just a fine idea when it was added. It lets you import whole swathes of functions into your program without that annoying typing.

Of course, since now ANYBODY can fuck with it, if you are trying to write robust code you need to do stupid annoying things like this to avoid accidentally polluting your code.

use Carp          ();
use Cwd           ();
use File::Spec    ();
use File::HomeDir ();
use List::Util    ();
use Scalar::Util  ();
use Getopt::Long  ();
use YAML::Tiny    ();
use DBI           ();
use DBD::SQLite   ();

Why do I need to do that stupid braces shit? Because the alternative is I have to audit every single dependency to make sure it doesn't export by default, and THEN I have to also trust/hope they don't start exporting by default in the future.

Loading modules the safe and scalable way means doing MORE work than the unsafe and unscalable way.

DBI gets it right. The default way of using DBI that is documented everywhere is superficially more verbose and anal retentive than I need for simple things.

But as the code gets bigger, the code keeps working just as well and just as safely. I would hypothesise that this diligence on the part of DBI and Tim Bunce has in a single stroke kept Perl web applications industry-wide almost entirely free of SQL injection attacks.

The savings in terms of just the admin workload and security spending and security-forced upgrades done on overtime on the first Tuesday of every month have probably justified Tim's entire career.

Has default-import really given us such a large benefit that it overcomes all the times people have to type () and resolved clashing imports of corrupted OO APIs? Is the time saved not having to type ':ALL' really worth all that?

I say no.

And I say that this growing nascent fad to screw around with my pragmas when your module isn't actually a pragma itself needs to be nipped in the bud before it gets worse.

</rant>

While this is perhaps a controversial position (and so it won't be factored into the scoring as part of the competition) I have to say I was greatly impressed that the Dancer guys have offered to implement some kind of configuration option so I can explicitly turn disable their Dancer-imposed warnings in production (which at least mitigates the worst Real World problem, while retaining the magic pragma behaviour).

Saturday April 03, 2010
12:17 PM

Mojo vs Dancer Week 1 - Installer, Support and Hello World

Lets Get Ready To Ruuuumblllleeee*cough*splutter* ahem. Sorry about that.

Welcome to the Mojo vs Dancer Top 100 Competition.

Over the next month or so I'll be building a replacement for my prototype CPAN Top 100 website simultaneously using the Dancer and Mojolicious micro-web frameworks.

The Competition Rules

While I do have a fair bit of experience with Perl coding, I will be trying wherever possible to behave in naively and newbie'ish fashion when it comes to community, support and tools.

I hope that during this process we can get some idea of the typical end user who won't know the right people to talk to or the right places to go for help.

One round will occur each weekend. I shall address one area each round, progressing until I have a working application (hopefully two).

While each weekend you will be subjected to my newbie code, during the rest of the week I will be inviting the development teams of both web frameworks to "improve" my code. They do so, however, at risk of LOSING existing points, should they try to hard to show off and create something I don't understand.

The week gap also gives plenty of time for each team to respond to my comments, to deny problems, to clarifying mistakes, and to fix, upgrade and do new releases.

For each issue I stumble across on my journey, I shall appoint only one winner. Each week may address more than one issue.

However, while I'll be recording the scores issue by issue, ultimate victory will be based entirely on my subjective personal preference for the one I think will be quickest and easiest for me to maintain.

If you'd like to follow along at home you can checkout the code for each project at the following locations.

Mojolicious - http://svn.ali.as/cpan/trunk/Top100-Mojo

Dancer - http://svn.ali.as/cpan/trunk/Top100-Dancer

Mojo - Getting to Hello World

I have some history with Mojo, being present at (and in a small way contributing too) its birth when Sebastian Riedel left the Catalyst project.

I've even attempted to build a Mojo application before, but was told at the time they weren't quite ready for actual users yet.

Their website is clean, efficient, but practically unchanged since I looked at it the first time.

It's also somewhat broken or at least unmaintained. The "Blog" link just points back to the same page, the "Reference" link points at what looks like a search.cpan.org failed search, and the "Development" link just throws me out to Github (which doesn't seem to really help much if I wanted to write a plugin or something other than hacking the core).

The "Book" link points to another search.cpan error page, and the most recent version at time of writing is 0.999924 which seems weird and makes me wonder how well they run the project.

Although the website doesn't fill me with confidence, the installation process is an entirely different story. One of Mojo's core design principles is to be pure perl and have zero non-core dependencies.

Installation via the CPAN client is fast, simple, and effortless. And I have full confidence that (if I needed to) I could just drop the contents of lib into my project and upload it to shared hosting somewhere.

I hear some rumors that to achieve this they've done rewrites of some very common APIs that work slightly differently, but I won't be looking into this right now. It will be a matter for another week.

To create my initial Hello World I've taken the most obvious approach and just cut-and-paste the code off the front page of the Mojolicious website, then stripped out the param-handling stuff, and modified the rest to something obvious looking. I've also added in strict and warnings, which the sample doesn't have.

Before attempting to run it, I have the following.

#!/usr/bin/perl
 
use strict;
use warnings;
use Mojolicious::Lite;
 
get '/' => 'index';
 
shagadelic;
 
__DATA__
 
@@ index.html.ep
<html>
  <body>
  Hello World!
  </body>
</html>

Looking at this code, it seems that everything is template based. This should be a good thing in general, as I'm a heavy Dreamweaver user and don't much like generating pages from raw code.

So far, it seems fairly simple. My main problem is that I have no idea what the hell "shagadelic" does, although I suspect it's some kind of way of saying "done". Whatever it is for, it annoys me enormously and dates the framework to (I assume) the release date of one of the Austin Powers movies. I get the feeling it is going to make Mojo feel more and more dated over time.

And they don't use strict or warnings, which seems a bit iffy.

When I run this helloworld.pl script, I get a handy little block of quite informative help text for my application for free.

C:\cpan\trunk\Top100-Mojo>perl helloworld.pl
usage: helloworld.pl COMMAND [OPTIONS]
 
Tip: CGI, FastCGI and PSGI environments can be automatically detected very
     often and work without commands.
 
These commands are currently available:
  generate         Generate files and directories from templates.
  inflate          Inflate embedded files to real files.
  routes           Show available routes.
  cgi              Start application with CGI backend.
  daemon           Start application with HTTP 1.1 backend.
  daemon_prefork   Start application with preforking HTTP 1.1 backend.
  fastcgi          Start application with FastCGI backend.
  get              Get file from URL.
  psgi             Start application with PSGI backend.
  test             Run unit tests.
  version          Show versions of installed modules.
 
See 'helloworld.pl help COMMAND' for more information on a specific command.

Running the obvious "perl helloworld.pl daemon" like it says on the website and connecting to http://localhost:3000/ I get a simple working "Hello World!" response first time.

So far so good then, except for the rather dead website. And no need to try any of the support channels yet either.

Dancer - Getting to Hello World

The Dancer website seems quite a bit more enticing than the Mojo website, at least superficially. There's evidence of more attention to some of the visual details, with more design elegance and things like source code syntax highlighting.

Clicking through the links, however, it's clear information is still a bit thin on the ground. And the "latest release" version on the download page is behind the version on CPAN, but not by much.

The website generally has more of a "new and undeveloped" feel to it, compared to Mojo's "mild neglect" feel.

One nice thing about the website, is that they've dropped a Hello World example directly on the front page for me to copy and paste.

After some small tweaks for my personal take on Perl "correctness" and legibility (the Dancer guys also don't use strict or warnings...) I have the following.

#!/usr/bin/perl
 
use strict;
use warnings;
use Dancer;
 
get '/' => sub { return <<'END_PAGE' };
<html>
  <body>
  Hello World!
  </body>
</html>
END_PAGE
 
dance;

The Dancer example is smaller and simpler than Mojo example, and doesn't make template use compulsory. Again, I can't stand this use of non-descriptive functions to end these programs. But at least "dance" is cleaner, is an actual verb, and is a bit less tragic than "shagadelic".

Instead, tragedy strikes for Dancer when I try to install it.

Because it doesn't install. Or at least, it doesn't install on Windows. Or perhaps it's just my Vista machine.

A redirect test is failing with this...

t/03_route_handler/11_redirect.t ............. 1/?
#   Failed test 'location is set to http://localhost/'
#   at t/03_route_handler/11_redirect.t line 36.
#          got: '//localhost/'
#     expected: 'http://localhost/'
 
#   Failed test 'location is set to /login?failed=1'
#   at t/03_route_handler/11_redirect.t line 44.
#          got: '//localhost/login?failed=1'
#     expected: 'http://localhost/login?failed=1'
# Looks like you failed 2 tests of 9.
t/03_route_handler/11_redirect.t ............. Dubious, test returned 2 (wstat   512, 0x200)
Failed 2/9 subtests

As a non-expert, that looks pretty serious. Maybe I'd force the install if it wasn't something as essential as redirecting that fails. But this is a pretty ordinary feature, and it's not working, and forcing in general scares me.

The one saving grace is at least it failed quickly. While not keeping dependencies to zero, they've done a fairly decent job of keeping the dependency list down to a minimum.

But the most damning factor here is a not that it failed once, but when I follow up by taking a look at their CPAN Testers results. These show failure rates all over the place, up and down, with some big regressions.

This kind of pattern usually suggests that Dancer is seriously lacking in their QA procedures, or have a complete disregard for platforms or Perl versions other than newer Perls on operating systems they personally have. This makes Dancer a risky choice for me to bet on, because it means it could all go wrong down the line unexpectedly.

So at this point, I'm going to stop with Dancer for this week, having failed to get Hello World working. We'll see if the Dancer guys can address this before next weekend.

Week 1 Results

Best Website - Dancer

While this isn't a massive victory, Mojo's many broken links hurt it, while Dancer shows at least some desire to be pretty, which I take as a hint that it might be easier for me in the future to make on my website pretty too.

Best Installation - Mojo

Zero dependencies and a fast installation that Just Worked, contrasts enormously with the failed installation of Dancer, and its unreliability over time (according to CPAN Testers at least).

If left to my own devices, I would probably already have committed to Mojo at this point, although reluctantly given Dancer's more desirable prettiness.

Best Hello World - Dancer

This one was quite close. Mojo suffered a bit because it forced its templating syntax onto me during Hello World, while Dancer suffered a bit because I had to resort to a Heredoc in the hello world.

In the end I'm awarding this to Dancer because of the pain in my brain that the function name "shagadelic" causes. I might have been cool for the first day or two, but long term I just know this is going to become an eyesore in my code.

Overall Leader after Week 1 - Mojo

Despite Dancer beating Mojo two to one on the individual factors, when the time came to do what I needed to do, Mojo installed quickly, gave me some help in the right place, and ran my Hello World without error or argument.

And Dancer did not these things.

Clearly, there's some QA work for the Dancer guys to do before next week, and the Mojo guys should probably dust some of the cobwebs off their website at the same time.

Next week, the competition will continue with database and ORM integration.

Until then, hopefully the respective teams will be blogging their responses and hopefully dealing with any issues raised.

Thursday March 25, 2010
10:32 PM

Pitting Mojo vs Dancer in a competition to build Top100 2.0

With most of my government hackery now finished, I find myself with a free timeslice for the first time in a while.

I plan to use the time to rewrite my CPAN Top 100 website from statically generated to being dynamically generated.

The main reason for this is that I want to start generating priority and ranking lists for individual authors or, say, the entire dependency tree of Padre in the same way I currently do for the entire CPAN. The underlying theme of the Top100 is around prioritising maintenance, but the toolchain seems to drown out the rest of the modules, by virtue of being needed for everything.

Having written my geo2gov.com.au website in Catalyst (to force myself to finally learn it) I'd like to also take a look at some of the new lightweight toolkits.

This is mainly because the Top100 database (and the CPANDB database it is based on) is available as an ORLite model. And tying a light weight ORM to a light weight web framework seems like a natural fit.

The two that seem to stand out as having sufficiently good project managers who understand the need to build communities around their code are Dancer and Mojolicious.

Since neither of the two looks like it particularly outshines the other on face value, my plan is build the new Top100 website simultaneously in both of them.

At some point during the process (hopefully near the end) I will declare a winner. The code for the loser will be discarded, and the code for the winner will go on to power the final production top100.cpan.org site.

The main judging criteria will be based on simplicity, legibility, and how amenable they are to collective maintenance.

To help evaluate the last criteria accurately, and to spice up the competition rules a bit, I'm going to be building both versions of the site in an open commit environment.

I will be keeping the code for both implementations in my open repository at http://svn.ali.as/ and I will be giving commit access to the authors of both Dancer and Mojolicious (and anyone else they nominate).

I see many "competitions" (especially in benchmarking) where the person running the competition writes all the code as well (badly) and ends up falsely judging certain entries. And I don't want that in this case.

While I will try to create the initial functionality for each new page or section of the site, both teams are allowed to refactor or improve anything as they see fit so that the code represents what they see as best showcasing their toolkits.

HOWEVER, teams take certain risks in "improving" my newbie code.

Any collective maintenance policy requires that experienced developers try to keep their code understandable for newbies. This keeps the pool of talent both wide and deep, while preventing newbies from accidentally breaking code too complex for them to understand.

So if the team-contributed changes to show off their toolkits in their respective websites makes it harder for me as a newbie to maintain my code, then they will be judged down accordingly.

In a similar spirit of newbie and small site friendliness, I will be developing the application on Windows without access to a web server, and deploying it onto a remote Linux server running Apache.

If building the application requires dependencies or external changes add features that don't work in both of these scenarios, they will be penalised.

As in real life, I won't be laying out precise scoring methods in advance.

After each chunk of work, I will assess and comment on my experiences, and assign points for that stage of the work (while trying to remain fair about weighting each task appropriately).

If you have your own web micro-framework you would like to include in the competition as well, you are welcome to petition for inclusion in the comments below.

However, be warned that to qualify for inclusion you must demonstrate sufficiently robust project management and community involvement. If you have a low bus-sensitivity(1) score, or are not sufficiently mature yet to make me think you are a viable long-term choice for my site (5 years or more) you won't make the grade for entry.

So let the fight begin!

Next post, creating the initial application skeletons!

(1) Bus Sensitivity is the number of people that would need to be hit by a bus for your project to effectively collapse.

Tuesday March 23, 2010
09:49 PM

Perl crime application takes 1st prize at apps4nsw hackathon

On Saturday Nerds for Democracy Jeffery and I launched our bid for victory in the app4nsw competition with a new look at crime in Sydney.

http://ali.as/crimealert

Although it was done in Perl (and reuses much of the Perl geo toolkit we used to take 2nd place in the Mashup Australia competition) Crime Alert is a lot more about access to data than it is about code.

With our geo2gov.com.au search engine, our strategy for the competition was to target a problem that the government is inherently unable to solve (or could only do so with great difficulty) due to constitutionally-imposed separation of powers between Federal, State and Local government.

For Crime Alert, the problems we are tackling are all about the limitations of statistics, in particular correlation vs causation and the need to maintain anonymity.

Historically, Australian governments have reported crime based on local government boundaries. This is partly for historical cost and organisation reasons (New South Wales only got a central crime database in 1997) and partly because government has a very strong position on anonymity in reporting.

For example, the Australian Census collects is significantly larger and collects significantly more sensitive material than the US Census, and the Australian Bureau of Statistics goes to extreme lengths to ensure that this information can't then be used for stalking, predatory marketing, etc.

Because crime data is only reported based on groups of 50,000+ people, it is depressingly uniform and not particularly interesting. Unless you are part of government or involved in allocation of police resources, crime statistics serve as nothing more than a curiosity. They just aren't that interesting.

Crime Alert wouldn't exist at all, except for the chance discovery of a series of PDF reports issued by the NSW Bureau of Crime Statistics and Research which contain crime "heat maps" for a subset of the states local government areas (currently, they only cover about a third of local governments).

The creation of these reports changes the game completely, because it means that the government is now comfortable they can release crime information at a resolution as low at 50 metres without violating anonymity.

To get the contours we used for our crime map, we had long discussion with their statisticians to demonstrate they we understood the area, that we would use the data responsibly, and to come to an agreement on particular crime types and metrics that would be both useful and relevant to the public.

With anonymity preserved, the second challenge was to find a way to present the information that is both simple for the consumer and statistically valid.

The complications here are numerous. The maps we use are from two years ago in 2008, the resolution is reduced to three zones based on one standard deviation either side of the mean, the crime types we use have strong time factors (particularly assaults), the crime scores don't control for population density.

But the biggest problem is correlation vs causation. Just because there are a lot of assault where (and when) you are, doesn't mean it's YOU that will be assaulted. So using historical crime density as a predictive factor directly is very very bad.

That doesn't mean it's not relevant at all though. It just means we can't make blanket statements without on-the-ground information.

So our application limits itself to providing simple low/medium/high factors for both your location and the time of day, both of which are based on the average for the local government you are in.

This lets us reduce all the complexity of crime down to a single concept, of being in "the wrong place at the wrong time". And we communicate this via two simple "Here" and "Now" boxes, linked so it is clear that both factors are important.

Taken to the next iteration, we would implement what a user experience designer told us is officially called "Ambient Personalisation".

In this form, you don't even show the user an interface.

Instead, software on your phone will run silently in the background (or on a cron) and from time to time it checks your location with the server.

If you happen to wander into a High crime zone during a High period, your phone beeps or sends you an SMS or rings you or vibrates to let you know you've wandered into the wrong place at the wrong time.

But once you've hit this location/time combination once, the device (or the server) knows you are now aware of the problems in this area and it won't bother you again. If you happened to move house into a bad area, after the first week the phone has now hit each of the time warnings, and won't bother you about it again.

But when you visit later go clubbing somewhere new, or park your car to pick up some groceries after visiting a friend somewhere new, your phone will notice that you've parked somewhere notorious, or that there's been a lot of assaults near this club.

And it's the warnings about these places you have never been that are the key, particularly in a low crime Western city where the crime is often quite concentrated around particular areas, and which start to make the phone an extension of your own ego instead of just a device you use.

Thursday March 18, 2010
12:26 AM

I've been right to be afraid of statistics

Under most circumstances, it's my habit to take something I've learned that I find useful and I'm fairly sure I've learned right, find an useful encapsulation boundary, and push that thing I've learned onto CPAN so I don't have to remember the details of how to do it right later.

But doing this with statistics has largely eluded me, except for the some of the most primitive mechanisms to joining data producers to statistics consumers.

Statistics are deeply confusing, and even thought I know they are confusing I still find it hard to make them fit in my head (because implications are often not even predictably confusing to me).

Fortunately, I can console myself that I'm not the only person for which this is the case, but at least I'm avoiding the use of them correctly.

http://www.sciencenews.org/view/feature/id/57091/title/Odds_Are,_Its_Wrong