Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

miyagawa (1653)

miyagawa
  (email not shown publicly)
http://bulknews.vox.com/
AOL IM: bulknews (Add Buddy, Send Message)

Journal of miyagawa (1653)

Monday August 21, 2006
09:14 AM

HTML::ToText::Simple

[ #30708 ]
Lazyweb,

I've been looking for simple HTML-to-text converter.

HTML::FormatText does most of what I want, but it does more than that. Rendering HR tags to horizontal "-----" is one example. I don't like that.

HTML::Element has as_text() method, which is very close to what I want. But apparently, it doesn't do the right thing with img@alt attribute (<img src="foo.jpg" alt="Bar" /> is dumped empty, not "Bar"), and "Foo<br />Bar" is dumped as "FooBar", not "Foo Bar".

I chatted with Yuval (nothingmuch) in #catalyst and would like to write a simple Visitor module to do with HTML::TreeBuilder generated tree.

If that sounds like a duplicate of someone else's work, let me know.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.