Another paste-archive of a mini-essay. This one from a reply to Skud's post on perl-qa. I've seen many variations on this theme.
Kirrily Robert wrote:
> We've got a situation where we have a suite of tests for a web app. It
> starts of testing the lib/ and whatnot, but eventually gets to the point
> where it uses Test::WWW::Mechanize to go fetch stuff from the
> developer's sandbox website and do a sanity check on the web application
> The problem is that all the developer sandbox websites run on one server
> that's groaning under the strain. It's in the process of being replaced
> but we're not there yet. The upshot of this is that on a good day, the
> web tests take ages to run, and on a bad day they time out.
> It's got to the point where the developers just kind of mentally tune
> out failures in the web tests, and I'm worried about the "broken window"
> Any suggestions for how to work around this? All I've got so far is the
> idea of splitting out the web tests into another directory, and treating
> them as "functional tests" that developers would typically run less
> often than the unit tests.
Ahh, the "one test server" problem. Each developer has a perfectly fine and highly overpowered computer sitting on their desk which is relegated to be, essentially, a dumb terminal. Maybe you run ssh into the dev box, a web browser and maybe an editor. What a tragic waste of resources.
Instead, each developer's machine should be capable of running a complete copy of the sandbox website. Then the tests fire up the sandbox on the local machine and run against that. No strained single test server to worry about. Individual devs can work isolated from other devs. They can futz around with the sandbox as much as they like without worrying about breaking everybody else.
This requires that
A) the setup of the web site be automated
B) the code not contain all sorts of hard-coded absolute paths
C) the dev machines contain the software necessary to run the site
A and B themselves have many other benefits outside testing. In general you'll want to move any hard-coded values out of the code and into a config file. Incidentally they also allow a single dev to run multiple sandboxes for multiple branches of code they're working on.
C is a little trickier. If the devs are using the same basic OS as the servers then its not so bad. Just install the appropriate packages and go. You can even make your project a package and declare all its dependencies.
But if the developers are using Operating System A (just for example, Windows) and the servers are using Operating System B (let's say Linux) then life gets a little tricker, but not impossible. If its just a few hold outs, then they can use the now not-so-heavily loaded central testing machine and everyone else can use their dev machines.
If most of your software is platform agnostic (Apache, a SQL database, Perl...) then your devs can install it. You can even go so far as to include all dependent source and the means to automatically build it in your repository.
Another route is to go the "lite" software route. Instead of testing with Apache and PostgreSQL, test with HTTP::Server::Simple and SQLite. Easier to install and configure. The downside is you're not testing against your real production environment so something still should test against a staging server.
If your software isn't platform agnostic, consider something like VMWare images. At this point I wave my hands like so and throw a ninja flash bomb *POOF!*
One major difference is that you're going from a homogeneous testing environment -- one server, one install, one version of the dependent software, one environment -- to a heterogeneous one. Many different environments, versions, operating systems, etc.
The homogeneous environment is a seductive one. Its simple and easy to maintain. You don't have to worry about different developers getting different results because they're using different versions of the software. You know that the machine the code was tested on and the production server match because they're built the same way and there's only one to worry about.
But it is an inflexible and all-or-nothing approach. For an example let's look the great buggaboo of the homogeneous testing system: upgrades.
Let's say you're using Perl 5.6.2. This means EVERYONE is using 5.6.2. Every developer, every system one a single version of Perl. This means everyone is coding for the same bugs, quirks and undocumented features of that particular version of Perl. As long as it all works on that one version nobody is thinking there's anything wrong. So everyone continues to write code with subtle mistakes that are more and more specific to that version of Perl.
Now you want to upgrade to 5.8.8. With just one test server there's nothing to do but upgrade it and see what happens, effecting everyone at once. With great dread and trepidation the upgrade is done and KERBLAM! Failures everywhere. All that code that was slightly wrong but just happened to work on 5.6 no longer works on 5.8. New warnings, fixes to bugs you were depending on, module upgrades, undocumented features revealed to be bugs and fixed, experimental features gone. Now what? Your test server is broken. Nobody can test anything. How do you fix the code to do the upgrade without breaking the test server?
The answer is you don't. You rapidly downgrade so people can get work done. Then maybe, if you're really dedicated, you come in after work, upgrade the test server and fix as much as you can then downgrade again before anyone comes into work the next day. More likely you just never upgrade. And than you wake up one day to find yourself running Perl 5.5.4, MySQL 3.22 and Apache 1.3 all on a Redhat 7.2 box. Deep at the bottom of a steep pit of upgrades.
Another, similar, example is what happens when someone wants to do an experiment? Maybe they want to try a new database, Postgres instead of MySQL. Maybe they want try Perl compiled differently. Maybe they want to try a different web server. Sorry, can't do it. It would require changing the test server. And anyway your code is so tied to a single environment, a single set of dependencies and a single version of them that it will be very difficult to code flexibility back in.
The heterogeneous environment avoids all this. Different developers can freely use different versions of dependent software and different environments. Inflexibility is immediately spotted and destroyed. A dev can experiment on their own box as they like. Individuals are using slightly different versions and incrementally discovering what breaks from version to version rather than all in one big upgrade leap. The version ball keeps getting moved forward.
The danger is too much flexibility. Its great that your software works on Oracle, MySQL, SQL Server, SQLite, PostgreSQL and DB2 but if its an in-house app and all you ever use in production is Postgres then all that extra work might have been a waste. Maintaining portability to 2 distinct systems, maybe 3, is enough.
The other danger is in never testing on the same environment as the production server. This is why you need a staging server, a server configured just like the production server where the software is installed and tested before it moves onto the production server.
That's the Big Upgrade Plan. There's all sorts of social and technical things to overcome. Meanwhile, here's a cheap hack: Run the full test suite only on commit. Store and display the results with something like Test::TAP::HTMLMatrix. Here's an example:
This is not ideal, but each commit is tested. The results are saved and displayed. A new failure can be easily tracked back to the commit and the developer who did it so they can immediately fix it.