Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Wednesday December 10, 2008
11:21 AM

The Big Bucket of FAIL (or is it?)

[ #38061 ]

I lied to chromatic, I think. A long time ago, when we were talking about the new version of TAP, he was concerned that many of the new features we were adding on would be not needed and therefore should not be added. I assured him that the new features would be optional and not change the meaning of core TAP.

I find myself sitting here, staring at my feet, wondering if I'm a bald-faced liar ("bald-faced"?).

So what, precisely, is a test failure? Is the following a failure?

1..1
ok 1 - Whee!  We pass!

Of course not. You have one test. It passes.

Um, not so fast there, cowboy. Now look at the following snippet of code.

END {
    unlink $highly_sensitive_data
      or die "You are soooooo fired, dude.";
}

Yes, you can test that, but you probably didn't. If the code dies (from the TAP::Parser perspective, if it exits with a non-zero exit status), then we consider that the test failed.

Um, not so fast there, cowboy.

Apparently, some languages don't give us control over the exit status (I don't know which ones, but it's a bug report we received), so we've been forced to implement a $parser->ignore_exit method.

But there's also the wait status of the process. That can be non-zero, indicating a failure. We recently fixed a bug with that. I found it while testing Rakudo and Alex Vandiver fixed it (it's in the upcoming 3.15 release. No PI for you!). Failure is tricky.

So pure TAP doesn't quite indicate if tests failed or not. Or they do, but they don't indicate if the whole test program is considered a pass.

And just to make it more difficult, consider this:

1..4
ok 1
ok
ok 2
ok 3

It's perfectly legal to omit the test number, but it's not legal to have gaps. That's a parse error and is considered to be a failure, even if all tests have passed. This is because we can't trust that output (what does it mean?). Same thing happens if you omit the plan.

As a result, a proper "test program failed" method should look something like this:

sub failed {
    my $self = shift;
    return
         $self->failed
      || $self->parse_errors
      || ( !$self->ignore_exit && ( $self->wait || $self->exit ) );
}

Thus, pure TAP doesn't quite indicate if a test failed, but it might if we add diagnostics, but that means I lied to chromatic. Damn.

As for my App::Prove::History code, this means you have to do this to see which test programs have failed:

SELECT r.suite_id, n.name, failed, exit, wait
FROM   test_result r, test_name n
WHERE  r.test_name_id = n.id
  AND  (
    r.failed > 0
    OR
    r.exit != 0
    OR
    r.wait != 0
);

A bit clumsy, no? And I don't even include the 'ignore_exit' bit, though I might have to later.

I'm thinking about a tiny denormalization here, but in reality, I'll probably slap a view over this and see how that works.

Pop quiz: is the following a failure? Why or why not? Is the existing behavior wrong?

1..3
ok 1 - Booting
ok 2 - Got dem boots!
ok 3 - We have foobar # TODO Waiting on foobar shipment

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Bald-faced means blatant, undisguised

    • I know what it means :) I just don't know why "bald-faced" is the term used.

      • It's a variation of "barefaced" (unconcealed or showing a lack of scruples) or "bold-faced" (impudent).
  • The obvious case for lack of exit status is PHP, where they are streaming TAP over HTTP and the HTTP stream will just stop at some point...

    And anything else doing TAP over HTTP for that matter.

    • Wouldn't that just result in a 0 exit status, though? If we ignore a the exit status, we'd have to be doing that for a non-zero exit status, meaning that these don't matter.

      • Well, HTTP doesn't really HAVE an exit status to communicate, since it's SUPPOSED to be transactional... sort of.

        The stream just stops, there's no return value as such.

  • Pop quiz: is the following a failure? Why or why not? Is the existing behavior wrong?

            1..3
            ok 1 - Booting
            ok 2 - Got dem boots!
            ok 3 - We have foobar # TODO Waiting on foobar shipment

    Asking if this is a fail is similar to asking if you run a test.

    In the install context, this is a merely a curiosity. It's a difference in expected behaviour, and such for the author it's a fail(ure), but it's not a F

    • That's pretty much spot on, though I think the existing behavior might be wrong for authors. If you're an author, I would say that the "unexpectedly succeeded" should be a fail. Otherwise, it should not. We don't want heisenfails. I want to know if code I am developing is susceptible to this, but it shouldn't cause pain for anyone else. However, just like running tests in xt/, this is something the tester must explicitly request.