Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

miyagawa (1653)

miyagawa
  (email not shown publicly)
http://bulknews.vox.com/
AOL IM: bulknews (Add Buddy, Send Message)

Journal of miyagawa (1653)

Monday August 21, 2006
10:14 AM

HTML::ToText::Simple

[ #30708 ]
Lazyweb,

I've been looking for simple HTML-to-text converter.

HTML::FormatText does most of what I want, but it does more than that. Rendering HR tags to horizontal "-----" is one example. I don't like that.

HTML::Element has as_text() method, which is very close to what I want. But apparently, it doesn't do the right thing with img@alt attribute (<img src="foo.jpg" alt="Bar" /> is dumped empty, not "Bar"), and "Foo<br />Bar" is dumped as "FooBar", not "Foo Bar".

I chatted with Yuval (nothingmuch) in #catalyst and would like to write a simple Visitor module to do with HTML::TreeBuilder generated tree.

If that sounds like a duplicate of someone else's work, let me know.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.