Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Thursday November 17, 2005
04:12 PM

Violating HTML Objects

[ #27618 ]

I finally realized why I was being stupid about how I was testing HTML and XML. The realization was so blindingly obvious that, in retrospect, I'm embarrased that I didn't appreciate it before. You see, if you squint, HTML and XML documents are instances of objects. I don't care about the whitespace, the attribute order, or if the nav links are at the top, bottom, left, etc. What I care about is the information these documents present and the things I can do with them.

When I test an object's API, I shouldn't care if it's implemented with a hashref, inside out objects or RPC calls. I should only care if it does what it promises. Can I click a link and go to the correct page? Does it even have that link? Is the title correct? Those are the things I care about. I'm embarrassed that I've wasted so much time testing rapidly changing internals when the things I really needed to test weren't changing that much.

Realizations like that really hammer home how much I have to learn.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • We're getting a disgusting amount of functionality out of this teeny-tiny program I wrote called simple_scan. It basically lets nearly anybody write TAP-based tests vs. anything you can talk to over HTTP.

    You use test specs to tell it what you want: [] /Yahoo!/ Y branded properly
    Run through simple_scan, that generates a nice Perl test that uses Test::WWW::Simple to see if the regex matches the page. Dead simple.

    It actually turns out that a lot of what we're worried about is "is the content ther

    • I should break down that test spec:
      1. URL to access
      2. regex to match against it
        • Y - match it
        • N - don't match it
        • TY - TODO, should eventually match
        • TN - TODO, should eventually not match
      3. comment

      And it's all TAP, so you can use Test::Harness to write tools to summarize. Should have lots more by CFP time.
  • Sounds like you want something like Test::WWW::Mechanize [], but without the Mechanize part.

  • What are you doing now ? Which tools are helping you (Test::WWW::Mechanize, no more XPath based tests etc...) ?

    By the way, were you testing the HTML page itself or the behaviour of the application generating the HTML page ?