Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Dear Log,
    On the mp3-trola: Grieg, "Cradle Song"

    So I'm trying my hand at writing an HTML::FormatRTF to go along with HTML::FormatPS and HTML::FormatText in the HTML-Format(ter?) dist. While writing pod2rtf was pretty straightforward, this is proving pretty tricky -- because the block-level content-models for HTML are so much trickier than the block-level content-models for Pod. If I could wave a magic wand and make all the HTML input be XHTML Strict, that'd be really handy. But for the moment, there's the problems like "<blockquote>foo<p>bar</blockquote>" parsing as:

    • blockquote
      • "foo"
      • p
      • "bar"
    whereas what I'd really want is this:
    • blockquote
      • p (implicit)
      • "foo"
    • p
      • "bar"
    The best, but trickiest, way is to make HTML::TreeBuilder optionally parse things the second way. It's best because it'd be useful to other people. It's be trickiest because first off it means messing with a pretty complex module; and because my idea is of what I want the parse-trees to look like for this purpose is not going to be The Answer of how you'd want them to look like for all purposes. For example, it'd be handy for purposes of rendering to RTF if "<li>foo<p>bar" would parse like:
    • li
      • "foo"
    • p
      • "bar"
    But for rendering to some other formats, you might want it to come out like:
    • li
      • "foo"
      • p
      • "bar"
    So, in conclusion: feh. The hardest thing in good programming is accepting that some things are best implemented as ad-hoc solutions, not Grand Solutions To Everything.

    Late news: For HTML::FormatRTF, I'm giving up on the approach that was meshing so badly with the HTML content-models, and doing something a bit stranger, sort of the way HTML::FormatPS does it.