I've spent much of the morning doing something I've meant to do for a long time: completely rewriting the internals of HTML::TokeParser::Simple. One problem that's long bugged me is that returned tokens were all blessed into one subclass, even though they are clearly different types. The latest version finally rectifies that. Now extending this module to handle special needs should be a piece of cake.
One sign that the module is much cleaner is the lack of "if" statements. Most of them are in the POD, but I did notice a couple in my HTML::TokeParser::Simple::Token::Tag class after I uploaded it. As soon as I saw that, I realized that this class should actually be two classes -- one for end tags and one for start tags. It's interesting how the mere existence of a keyword points out a design problem. Start tags are what most people are really interested in, but overriding this class means overriding behavior of end tags. Silly me. I should fix that, too.