A little whoopsie of mine in HTML::TreeBuilder basically broke version 3.12, and yet didn't cause any of the HTML-Tree tests to fail. Michael Koehne is a superstar because he spotted this and told me.
So I rushed out a new version today (3.13) , with some more and smarter tests that will stop things like this from happening again.
Most (but not all) of the new tests each take two bits of HTML and making sure that they parse to isomorphic parse trees. Given a wrapper function same, the tests are mostly like
ok(same( '<ul><li>x<li>y</ul>after' => '<ul><li>x</li><li>y</li></ul>after' ));.
One thing that Michael Koehne suggested is ensuring continuity across versions by having tests that basically take a bit of HTML, parse it, dump the parse tree as text, and run a checksum on that text. Then the test consists of making sure that that checksum stays the same across different HTML-Tree versions. He suggested MD5 for the checksum algorithm; but I'm hesitant about using it, since that would mean making HTML-Tree have a dependency on the MD5 module. Maybe I'll just make the tests skip on sites that don't have the MD5 module intsalled. Anyone have other suggestions?
Equality checking (Score:2)
If you don't want to put the MD5 of the canonical version in the test case, why not put the stringified Data::Dumper value in the test? No CPAN dependency that way. :-)
Re:Equality checking (Score:1)
I'd say use the MD5. If they don't previously have it, it will at least mean their CPAN.pm will start using it.
---ict / Spoon
MD5 (Score:2)
is a core module in 5.8.0 just in case you hadn't noticed that.
Re:MD5 (Score:1)
Checksum (Score:1)
Why not use the unpack() checksum:
$sum = unpack "%32C*", $string;Re:Checksum (Score:2)
DB<1> sub csum { unpack "%32C*", $_[0] }
DB<2> x csum "+abc-"
0 382
DB<3> x csum "-abc+"
0 382
DB<4>