Monday August 21, 2006
10:14 AM
HTML::ToText::Simple
Lazyweb,
I've been looking for simple HTML-to-text converter.
HTML::FormatText does most of what I want, but it does more than that. Rendering HR tags to horizontal "-----" is one example. I don't like that.
HTML::Element has as_text() method, which is very close to what I want. But apparently, it doesn't do the right thing with img@alt attribute (<img src="foo.jpg" alt="Bar" /> is dumped empty, not "Bar"), and "Foo<br />Bar" is dumped as "FooBar", not "Foo Bar".
I chatted with Yuval (nothingmuch) in #catalyst and would like to write a simple Visitor module to do with HTML::TreeBuilder generated tree.
If that sounds like a duplicate of someone else's work, let me know.
Tried lynx? (Score:1)
Just a thought
Re: (Score:1)
I like vilistextum [dyndns.org]. Decent rendering and very fast.
Re: (Score:1)
I do this, sort of (Score:1)
http://www.jrock.us/trac/blog_software/browser/trunk/Angerwhale/lib/Blog/Format
I agree that I should probably replace imgs with their alt (instead of dropping them). If you feel like working on this, I'll probably replace my code with your module.
However, I think I'll add that feature, and process the output with Text::Autoformat, as well. Let me know what you think.
Re: (Score:1)