Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Robrt (1414)

Robrt
  (email not shown publicly)

robert at perl dot org

Journal of Robrt (1414)

Saturday August 07, 2004
01:54 AM

untilting the planet

[ #20287 ]

A few people (this means you Vahe) had noticed that planet.perl was formatted funnily. The reason - HTML truncation in some of our inbound feeds. I.e. the RSS would include the opening <ul> but not the closing </ul>. This could cause odd indentation further down the page.

A not-so-well-kept secret is that planet.perl is based on a Python tool called planet. So, while watching an episode of the PowerPuff Girls, I subclassed HTMLParser and wrote the utility to add missing closing tags, where appropriate. The code is stupid, but tiny and crystal clear.

One thing Python generally gets right -- it is trivial to subclass its core modules.

Result? No more weird indentation! At least until even odder broken HTML comes in....

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Nice trick, I like it.

    Except it's generating </br> , which is not a valid tag. But my browser is more forgiving than I am. ;-)

  • I've fixed the </br> issue. I've got a table of tags that shouldn't be balanced. I could go look at the spec, but it's more fun to guess.