Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I stared at this piece of selfdocumenting code for a while:
            my ($regexp, $reason) = /^(.*):(.*)$/;
            $regexp =~ s/^\.\+//;
            print $re "\t", fixup_re($regexp), "               {return \"$reason\";}\n";
    but I still can't figure out
    1. what the "reason" is for
    2. how to use the compiled resulting XS module from a Perl script

    Could you please provide a 3 (or so) line data file that we could just compile and run in some little demo script? Just to show the intention of it all.

    • Yeah sorry - it's a hack that I didn't have any time to document.

      Input is a file that looks like this:

      (duc|lgh|sw)[0-9]+[ab]*\.old\.fagotten\.ac:generic
      [0-9a-f]+\.myntet\.ac:generi c
      [0-9]+[a-z]\.old\.myntet\.ac:generic
      lgh[0-9]+\-p[0-9]+\.nejlikan\.ac:edu

      i.e. "regexp:reason"

      Then to run, it's just:

      use MyModule;
       
      if (my $reason = MyModule::scan($string)) {
          print "Matched: $reason\n";
      }

      • OK, I tried it... and it works. Kind of I had to delete the "time" in the system() call as Windows doesn't support it. Anyway, the sample compiled fast. Very fast. You scared me for no reason. :)

        I'm somewhat disappointed with what the module can do. I was hoping to have a basis to reimplement URI::Find [cpan.org], thus: something that can find matches anywhere in a random text. There's two major reasons why it can't do that. First: it really is a lexer: it can only match prefixes in a string. To use your examp
        • you might want to keep an eye on http://svn.apache.org/viewvc/spamassassin/branches/jm_re2c_hacks/rule2xs/ [apache.org] -- I'm hacking away on it for SpamAssassin, and I think with work you could probably find those features working there.

          Matt, does it really support [classes], (alt|er|nations), and {quantifiers}? wow, I wasn't even expecting that!! holy crap.
        • To match anywhere in the string prefix with ".*". I haven't quite figured out how to tie to the end of the string yet, but it should be doable with what the code generates.

          The limitation on just returning the reason is arbitrary - that's all I needed for the given problem domain, but you can definitely return "what matched" and "where in the string?". That should be a simple matter of programming.

          (the long compile times are for when you have LOTS of regexps - I compile over 15k into one module).