Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Sunday September 19, 2004
04:19 PM

HTML::TokeParser::Simple 3.1 -- major rewrite

[ #20950 ]

I've spent much of the morning doing something I've meant to do for a long time: completely rewriting the internals of HTML::TokeParser::Simple. One problem that's long bugged me is that returned tokens were all blessed into one subclass, even though they are clearly different types. The latest version finally rectifies that. Now extending this module to handle special needs should be a piece of cake.

One sign that the module is much cleaner is the lack of "if" statements. Most of them are in the POD, but I did notice a couple in my HTML::TokeParser::Simple::Token::Tag class after I uploaded it. As soon as I saw that, I realized that this class should actually be two classes -- one for end tags and one for start tags. It's interesting how the mere existence of a keyword points out a design problem. Start tags are what most people are really interested in, but overriding this class means overriding behavior of end tags. Silly me. I should fix that, too.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Thanks for all the works you've put into this! HTML::* needs all the smart help it can get.
    • HTML::* needs all the smart help it can get.

      And I try to help, too :)

      And feeling rather silly about my failure to break start and end tags into their own classes, I went ahead and did it now and just uploaded it. I've made major changes, so I'm sure there are huge bugs, but I'm pleased at how easy the changes are now. That makes 3 releases of this module in two days. I should really be less impetuous.

  • You really had me wondering why the big version bump, yesterday. It looks like (though I haven't checked) you implemented the kind of change I was expecting, warranting such a version bump, only in 3.1. Oh, and there's even more of that fresh kind of stuff in 3.11, er, 3.12. Timing goes odd, sometimes.
    • The big version change was because of the new interface. While still backwards compatible, the new style constructors, the "get_foo" instead of "return_foo" names and a few other odds and ends are why I went with 3.0. From my standpoint, if I kept the interface the same and massively reworked the internals, there's really no justification for a version bump. Would anyone want MS Office 2005 if it had no new features and ran a touch slower? :)

  • I tried to install the new version 3.12 on 2 Windows PCs today, and while on one it succeeded, on the other, it failed big time (even locking up my console window, forcing me to restart my computer but I'm sure that's not your fault... ;-)).

    Digging into the problem, I tried a manual install, step by step, and I found that -Mblib adds the lib directories under blib to @INC. But the file HTML/TokeParse/Simple.pm wasn't under blib/lib, instead, it was under lib, a sibling directory. That directory is

    • I'm a bit confused as to why adding '../blib' to @INC would cause things to fail. After running perl Makefile.PL; make, the blib directory is built automatically. Did you skip that step and try to run the tests directly? That would cause things to fail since I added the wrong lib.

      Adding '../blib' to @INC is a typo on my part as I generally intend to add '../lib' to @INC to allow me to modify the file directly and have the changes instantly picked up. Further, I can run the tests without even running m

      • I don't know any more... I've tried to build it several times over, deleting the blib directory every time, and I don't get the same results all the time. Sometimes the whole of the lib directory is copied to under blib, but sometimes it isn't, and blib/lib/HTML/TokeParser ends up containing only one file: ".exists".

        So, what's up... No idea. I think that perhaps the whole make circus occasionally goes haywire. I'll try again later, I've now given up for the day.

      • ... I'll have a new version uploaded soon.

        Couldn't you find any excuse to bump it to version 3.14? That sounds like a nice, geekish version number to aim for... :)

        Anyway, I have had the time to update a largish script of mine from HTML::TokeParser to HTML::TokeParser::Simple 3.13. I quite like it. If there's anything I miss, it's the option to extend

        $token->is_start_tag            # is it a start tag
        $token->is_start_tag($tag)      # is it a start tag o

        • That's an interesting idea. I wonder if I should create a new method to deal with this? I've already heavily overloaded this method and overloading methods is not Perl's strong suit :( How about &is_tag_in_list and corresponding start and end method? The method name could be confusing, though:

          if ($token->is_start_tag_in_list) {...}

          That suggests that the token is a start tag when, in fact , it may not be. I guess the overloaded method would be better after all :/

          The above, incidentally, was