Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Robrt (1414)

Robrt
  (email not shown publicly)

robert at perl dot org

Journal of Robrt (1414)

Saturday August 07, 2004
02:54 AM

untilting the planet

[ #20287 ]

A few people (this means you Vahe) had noticed that planet.perl was formatted funnily. The reason - HTML truncation in some of our inbound feeds. I.e. the RSS would include the opening <ul> but not the closing </ul>. This could cause odd indentation further down the page.

A not-so-well-kept secret is that planet.perl is based on a Python tool called planet. So, while watching an episode of the PowerPuff Girls, I subclassed HTMLParser and wrote the utility to add missing closing tags, where appropriate. The code is stupid, but tiny and crystal clear.

One thing Python generally gets right -- it is trivial to subclass its core modules.

Result? No more weird indentation! At least until even odder broken HTML comes in....

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Nice trick, I like it.

    Except it's generating </br> , which is not a valid tag. But my browser is more forgiving than I am. ;-)

  • I've fixed the </br> issue. I've got a table of tags that shouldn't be balanced. I could go look at the spec, but it's more fun to guess.