Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I’m not sure I understand what your code is doing, so I’ll code to your specification instead…

    sub compress_body {
        my ( $html ) = @_;
        my $parser = HTML::TokeParser::Simple->new( \$html );
        my $text = '';
        my $in_body;

        1 while $_ = $parser->get_token and not $_->is_start_tag( 'body' );

        while ( my $token = $parser->get_token ) {
            if( $token->is_text ) {
         

    • by Ovid (2709) on 2005.10.25 23:02 (#44157) Homepage Journal

      The problem is, it gets passed an entire HTML document and has to preserve everything up to the first body tag (inclusive) and after the last body tag (also inclusive).

      • Ah, now suddenly all your contortions make sense.

        sub compress_body {
            my ( $html ) = @_;
            my $parser = HTML::TokeParser::Simple->new(\$html);
            my $out = '';
            my $body = '';

            while ( my $token = $parser->get_token ) {
                if( $token->is_start_tag('body') .. $token->is_end_tag('body') ) {
                    $out .= $token->as_is if $token->is_tag( 'body' );