Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

petdance (2468)

petdance
  andy@petdance.com
http://www.perlbuzz.com/
AOL IM: petdance (Add Buddy, Send Message)
Yahoo! ID: petdance (Add User, Send Message)
Jabber: petdance@gmail.com

I'm Andy Lester, and I like to test stuff. I also write for the Perl Journal, and do tech edits on books. Sometimes I write code, too.

Journal of petdance (2468)

Wednesday February 25, 2004
11:30 PM

HTML::Tidy 1.00 is finally out

[ #17630 ]
My followup to HTML::Lint, HTML::Tidy, has just been released at version 1.00. It does NOT include the Test::HTML::Tidy wrapper, but it DOES include a handy guide on how to build libtidy, and a transition guide for HTML::Lint users.

What would you do with HTML::Tidy? Something like this:

    use HTML::Tidy;

    my $tidy = new HTML::Tidy;
    $tidy->ignore( type => TIDY_WARNING );
    $tidy->parse( "foo.html", $contents_of_foo );

    for my $message ( $tidy->messages ) {
        print $message->as_string;
    }

or some other level of automated HTML checking. With Test::HTML::Tidy (which I hope to get out tonight), you'll be able to do

    html_tidy_ok( $html, "HTML is properly tidy" );

in your *.t scripts. Whee!

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • So does this have the same functionality as the on-line W3C validator service (without having to go to the web)? I believe their validator is just a perl script that calls tidy somehow.

    http://validator.w3.org/ [w3.org]
    • Hrmm, i don't think so.

      Check the source page [w3.org]. No mention of Tidy. Also, it says "OpenSP is the SGML and XML parser used by the service". So i assume it parses the output from that.

    • I believe the W3C Validator does not use tidylib. It uses
      use File::Spec          qw();
      use HTML::Parser   3.25 qw(); # Need 3.25 for $p->ignore_elements.
      .. and some other good stuff. The source [w3.org] is available. I would be very interested to know the similarities and differences between tidylib and this validation. [I could go look it up!]
      • The W3C validator requires an installation of OpenSP [sf.net], which is a fairly heavyweight requirement.

        I'm not sure quite what tidylib does, but I'm going to give it a play and see what it does. If it's faster than onsgmls, then I'm all for it!

        Your other option for validation is to get libxml2 (in its perl form XML::LibXML) set up. The disadvantage (which it shares with OpenSP) is that it requires you to have all the catalogs for html/xhtml set up correctly. I'm assuming that tidylib has all that sort of st

        • There are CGI tidy interfaces out there, so you can see what tidy reports on. tidy also does cleanup on the HTML, and prettifies it for you, although HTML::Lint doesn't support that.
          --

          --
          xoa