Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I was in a project some time ago that involved converting ~20.000 HTML pages of inferior quality (coded by civil servants of various departments) to valid, WAI-conform HTML.

    We used tidy to validate WAI-level 1 conformity and it worked quite well. (although we used the command line version, as the lib wasn't really finished then).

    To convert the messy HTML to valid HTML we used HTML::Parser and a hell lot of code specially tailored to this project. I do not think that you can write up a generally working converter (unless you don't care about the design :-)

    • I wasn't thinking of a converter, more a simple checker, that can spot whether titles appeared in links, or that tabindex entries didn't clash, etc. If HTML::Tidy [cpan.org] does that then so much the better. I can't see me ever having time to write a converter, as that would be way too much work for all the edge cases that might appear. A list of simple warnings is much better, so that you can fix your templates [1] by hand.

      [1] I couldn't envisage editing 20,000 pages!