Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Monday August 25, 2003
10:32 AM

Regexp::Token - Match arbitrary tokens instead of characters

[ #14305 ]

I've finished an alpha version of my Regexp::Token module and included Regexp::Token::HTML as a "starter kit", if you will. Basically, the module allows you to match arbitrarily defined tokens in addition to characters. Frankly, I don't know how useful people might find it and I've already had two comments from readers of my Perlmonks posting about this to the effect that they don't understand what I'm trying to do. I'm going to have to take some time to write this up more carefully and come up with some comparative examples.

my $p_token = Regexp::Token::HTML->create_token('<p name="" class="">');
my $p_tag   = Regexp::Token->create($p_token);

$html = <<END_HTML;
<h1>testing</h1>
<p name="goo" class="ber"> <p CLASS=baz name='easy'>
<h1>end test</h1>
END_HTML
my ($result) = $html =~ /((?:$p_tag )+)/;

my $two_tags = q{<p name="goo" class="ber"> <p CLASS=baz name='easy'> };
is($result, $two_tags, '... and we should be able to capture token text');

I'm also getting some weird errors from the module and I need to find out where my undefined errors are coming from. And if anyone is familiar with things to watch out for in forking code, I would love it if you could review what I'm doing and let me know if there are any dangers to watch out for.

Update: A slightly updated version of Regexp::Token gets rid of the warnings and passes the tests much better. It also gets rid of an ugly hack and uses {(?!}) to fail a match.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I guess the main question I have is what are you trying to achieve. Are you trying to make regular expressions simpler and more descriptive? Or are you trying to do higher level parsing with regular expressions?

    The problem with doing the latter is that most things people want to parse are too sophisticated for parsing with regular expressions. The Perl5 regex engine is more powerful than standard regular expressions and can match higher-level grammars. I suspect that the regular expressions used woul