Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Sunday March 08, 2009
06:50 AM

Anecdote Driven Development, or Why I Don't Do TDD

[ #38616 ]
Number 6
Where am I?
Number 2
In the Village.
Number 6
What do you want?
Number 2
We want information.
Number 6
Whose side are you on?
Number 2
That would be telling. We want information... information... information.

(From the opening of every episode of The Greatest TV Series Ever Made)

Just about everyone in the Perl community who does testing knows that I'm a huge testing fan. In fact, to steal a turn of phrase from Adrian Howard, you might even call me a "testing bigot". I wrote the new Test::Harness that ships with Perl's core. I've written Test::Most and Test::Aggregate. I maintain or have co-maintainership on a number of modules in the Test:: namespace. I was invited to Google's first Automated Testing Conference in London and gave a lightning talk (my talk on TAP is about 42 minutes into that). I was also last year's Perl-QA Hackathon in Oslo, Norway and I'll be at this year's Perl-QA Hackathon in Birmingham, UK. I was also the one of the reviewers on Perl Testing: A Developer's Notebook.

In short, I'm steeped in software testing. I've been doing this for years. When I interview for a job, I always ask them how they test their software and I've turned down one job, in part, because they didn't. If I'm just hacking around and playing with an idea, I don't mind buying some technical debt and skipping a bit on testing while I'm exploring a new idea. I've even posted some code to the CPAN which is a bit short on testing. That being said, I wouldn't dream of writing major software without testing and I don't want to release any CPAN code as version 1.00 without comprehensive test coverage to have a minimum baseline of guaranteed functionality.

A number of years ago at OSCON I was attending a talk on testing when Mark Jason Dominus started asking some questions about testing. He was arguing that he couldn't have written Higher Order Perl (a magnificent book which I highly recommend you buy) with test-driven development (TDD). If I recall his comments correctly (and my apologies if I misrepresent him), he was exploring many new ideas, playing around with interfaces, and basically didn't know what the final code was going to look like until it was the final code. In short, he felt TDD would have been an obstacle to his exploratory programming as he would have had to continually rewrite the tests to address his evolving code. I tried to explain alternate strategies to deal with this, including deleting the code and adding tests back in and, in fact, when I created CPAN distributions of three of the HOP modules, I did find a couple of bugs in my testing. Regardless, I was hard-pressed to rebut his arguments.

Now that's not really a terribly terribly heretical idea. Many people realize that exploratory programming and TDD don't always play well together. James Shore has a great blog post about Voluntary Technical Debt and how this helped them launch CardMeeting. It's the same story: tight deadline, new idea, playing with concepts.

But that's not quite what I have in mind when I say "I don't do TDD". In reality, sometimes I do TDD, but not very often. I've tried TDD. I've written a test, added a stub method, written another test, returned a dummy object, written another test ... and so on.

Two words: tee dious (sic).

At work, we get a fairly substantial set of requirements for each task. We're a core application which many other BBC projects rely on for their programme metadata. When we get requirements, they're generally fleshed out enough that when we think through the problem, we have a decent handle on what needs to be done, even if the exact implementation isn't nailed down perfectly.

When I get a new task, I usually start by reading the tests for the code I'm working on and I often write quite a few tests first and then write the code for it. This isn't "pure" TDD in the minds of many people as I'm not writing a single test, then code, then another test, and then code, ad nauseum. However, it's close enough for TDD in my book.

That's not the only way I write code, though. Sometimes we have a complex case where I want to see what's going on in our interface and I'll load some fixture data, fire up a browser and start exploring our REST interface. Then I'll write some code and verify that our title=Doctor Who query parameters are returning correct results. Then I'll write the tests.

To some, this would be heresy. You must always write the tests first, right? My reply: can you provide me with data to back that up?

I recently wrote some code for Class::Sniff which would detect "long methods" and report them as a code smell. I even wrote a blog post about how I did this (quelle surprise, eh?). That's when Ben Tilly asked an embarrassingly obvious question: how do I know that long methods are a code smell?

I threw out the usual justifications, but he wouldn't let up. He wanted information and he cited the excellent book Code Complete as a counter-argument. I got down my copy of this book and started reading "How Long Should A Routine Be" (page 175, second edition). The author, Steve McConnell, argues that routines should not be longer than 200 lines. Holy crud! That's waaaaaay to long. If a routine is longer than about 20 or 30 lines, I reckon it's time to break it up.

Regrettably, McConnell has the cheek to cite six separate studies, all of which found that longer routines were not only not correlated with a greater defect rate, but were also often cheaper to develop and easier to comprehend. As a result, the latest version of Class::Sniff on github now documents that longer routines may not be a code smell after all. Ben was right. I was wrong.

But what does this have to do with TDD? One problem I have with the testing world is that many "best practices" are backed up with anecdotes ("when I write my tests first ..."). Plenty of questionable assertions (ha!) are made about tests and some of these assertions are just plain wrong. For example, tests are not documentation. Tests show what the code does, not what it's supposed to do. More importantly, they don't show why your code is doing stuff and if the business has a different idea of what your code should do, your passing tests might just plain be wrong.

I also don't write many unit tests unless I'm trying to isolate a particular bug or I have code paths which are difficult to demonstrate with integration tests. I prefer integration tests as they demonstrate that various bits of my code play well together. I've long found, anecdotally, that integration tests make it easier to stumble across bugs -- though I admit that they're then harder to track down. Again, eschewing unit tests is heresy to many test advocates, but in my experience, it works fairly well.

I've also long suspect that TDD can prove to be a stumbling block, but I've advocated it as a way of better understanding what your interface should look like. I've really not spoken too much about my objections to it because, quite frankly, the opinion of the testing world seems to be dead set against me and who am I to argue with the collective wisdom of so many? (I usually get bitten pretty hard when I do so. This is why I want a blog where I can delete trollish or rude comments while keeping reasonable comments of those who disagree with me)

The problem with all of these opinions is that I rarely see hard information about them. I don't see hard numbers. I don't see graphs and statistics and circles and arrows and a paragraph on the back of each one (with a tip 'o the keyboard to Arlo Guthrie). So you can imagine my delight when I read a blog post about research supporting the effectiveness of TDD. This is the meat I want and here's a snippet from the actual research paper:

We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed.

More productive? Minimum quality increased? That sounds great and now we have at least one study to back this up. Of course, more studies should be done, but this is a great start.

Um, or maybe it's not a great start. Jacob Proffitt has a great blog post where he analyzes the results of the study. He agrees with some of the conclusions of the study, namely ...

  • The test-first students on average wrote more tests.
  • Students who wrote more tests tended to be more productive.
  • The minimum quality increased linearly with the number of tests.

But in digging further down, he found that ...

  • The control group (non-TDD or "Test Last") had higher quality in every dimension—they had higher floor, ceiling, mean, and median quality.
  • The control group produced higher quality with consistently fewer tests.
  • Quality was better correlated to number of tests for the TDD group (an interesting point of differentiation that I’m not sure the authors caught).
  • The control group’s productivity was highly predictable as a function of number of tests and had a stronger correlation than the TDD group.

In short, "test last" programmers had higher quality code. However, you should also read the comments to that blog post, including an interesting reply by one of the authors of the cited study.

So does this prove that you should be "testing last" instead of "testing first"? Of course not. This was only one study. Correlation is not causation. These were undergrads, not professional programmers. There should have been a "no testing" control group (I should note that I've become so dependent on tests that I actually write worse code when I'm trying to modify something without tests).

So for the time being, I'm quite comfortable with my testing approach. I've given up on "pure" TDD of writing one test at at time. I'll happily write a block of tests and then the code. I'm also happy to write a block of code and then the tests. I can now even cite a study to show that I'm not a complete moron, even if one study is little more than anecdotal information.

Frankly, I've been secretly embarrassed about the fact that I'm not really a TDD zealot. Also, I've found with my testing strategy that I'm not writing as many tests as others. This has also been a bit of an embarrassment for me, but I've not brought it up before. My code works well and I'm (usually) comfortable with the quality after a few iterations, but since I've joined the testing cult, my apostasy is not something I've been terribly keen to bring up.

Now if you'll excuse me, I have some more episodes of "The Prisoner" to watch.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • The sample sizes were far too small to draw any reasonable conclusion.

    I don't think you're going to find much "evidence" unless there are large scale studies of complex projects. (Which have their own issues as you don't have good controls). This is the reason why much of "social science" isn't really science.

    Speaking personally and anecdotally, however, my hypothesis is that the reported "effectiveness" of TDD is driven largely by two factors: (a) it promotes a well-articulated description of expectation

    • When I am writing several tests first I do think it's sort of in the spirit of TDD but purists might take exception. One extreme TDD exercise [] I read about sounded very frustrating for the participant, but the person coordinating the exercise responded in the comments that he wouldn't use the "one test, one bit of code, repeat" style every time, so I'm glad that he's not over zealous about it.

      Still, I often find myself writing quite a bit of code and then coming back and writing the tests. The times I usua

      • On the "purists" comment -- it could be the normal difference between the teaching environment and the real world. In the teaching environment, the importance of "proper" process tends to get exaggerated. In the real world, with experience, we take shortcuts.

        On API testing -- I've often done "can_ok" either for a class or else for "main" to confirm automatic exports. What I've never really done (well) is confirm that no other methods exist. I might need to look into Class::Sniff or related techniques fo

      • Your idea for detecting accidental overrides seems overly complex. How about just comparing what is meant to be overridden to what Class::Sniff->overridden says has been.
        • Class::Sniff is too heavy weight for this. It also captures code at a snapshot in time. It doesn't tell me if the method cache is invalidated (MRO::Compat will let me do this with mro::get_pgk_gen).

  • I wonder if the TDD group forgot the final phase (refactoring)? You create your test, create your code to pass those test and then your refactor the passing code into best practices. If you forget that last part I can see how you would lag on code quality.

  • My previous comments on method length were not advocating longer methods. I generally prefer shorter methods myself. But there is a tension between what I believe works and the research I have seen.

    The problem with the studies cited in Code Complete is that they were generally done in procedural languages before people adopted object oriented approaches. Does research on what works in procedural programming still apply to code written in object oriented or functional styles? Good question.

    I don't have a

  • For a lot of the HOP code, TDD would have been useful. My OSCON question was specifically about the "linogram" system of chapter 9.

    "Linogram" was a program that was completely unlike any other program I had ever seen. When i started, I had no idea what the input language would be like, no idea how it would work, even no idea of what the program's capabilities would be. If you had asked me "will linogram be able to draw a box with an arrow coming out of the side" I would have said "I think so, but I'm n