Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • This seems like a bug, and needs to be reported. Can you supply the offending string and regex? It may be a bug in PCRE [wikipedia.org], which is the regex engine that PHP uses for doing that. Trying the same with an equivalent ANSI C program using PCRE, may be instructive in pin-pointing the problem.

    Regards, Shlomi Fish.

    • That's what I think too. Mind you, I'm using PHP 4.3.10-22 so maybe it's already been fixed. I mentioned it only because it seemed so unlikely.

      If you have a more up to date version of PHP or just want to have a play, then you can download my test script and two test files (both which fail at first and then match as they lose length) from http://perltraining.com.au/~jarich/php-pcre.tgz [perltraining.com.au].

      The two files are clean.txt and dirty.txt. dirty.txt is one page of the spam that was successfully tricking the re

      • Running a slightly modified test script against php-cli-5.2.3-10mdv2008.0 with libpcre0-7.2-1mdv2008.0, I'm getting:

        746
        745
        744
        743
        742
        741
        740
        739
        738
        737
        Matched spam.

        So it got worse. I can later try it with grep -P or with pcregrep.

  • I brought this up with Ben Balbo (an excellent PHP programmer) and he mentioned that there are two similar bugs which have been submitted in the past.

    As he says:

    The first suggests it's a limitation of PCRE, and the second simply dismisses it as not implying a bug in PHP itself.

    As the PCRE website [pcre.org] appears to be having problems, I'm at a loss how to get this issue fixed. I've worked around it, but really I'd just rather