Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I'm wondering how this takes regex greediness into account.

    What if you have a regex

    qr/(?:01)*/

    and you have a file which is several MB of random zeros and ones. If you read in

    111101010

    into your buffer, how can you know that the zero at the end of your buffer isn't about to be followed by another one?

    It seems that if you have greedy regex elements, then you may have to slurp in the whole file to be able to tell whether you've matched the longest posible record separator. One could write even more pathal

    • Greediness: if we get a match that leaves nothing in the buffer, then read some more into the buffer and try again until we either exhaust the file (for a regexp like /.*/s) or have a match that leaves something in the buffer.

      Pathological cases will cause the entire file to be read into memory, but I don't see a way around that. If your record separator is /.*/s then you're saying to Perl "the entire file is my record separator". I don't see a way to handle this except by reading the whole file. That's your own silly fault for having such a bogus record separator.

      --Nat

      • > if we get a match that leaves nothing in the buffer

        But the point I was attempting to make was that whether or not something is left in the buffer is not the best indication of whether or not the RS matched enough stuff. Maybe I'd have to see the actual code, but from the description of it, it sounds like it could behave differently depending on how input matched up with buffer size. You could use the same data and RS and get different results depending on your buffer size.

        my $record_sep = qr/(?:01)*