Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • > Changing my $person to
    >    qr/[\x08\x09](G[A-Z]{7})/
    > or
    >    qr/[\x08\x09](S[A-Z]{7})/
    > gives me a bunch of different records. But why would
    >    (S[A-Z]{7})
    > give different results than
    >    ([A-Z]{8})
    > ?. I'm stumped.

    I'm not sure if I'm misreading this, but it looks like you have three different regular expressions there:

        qr/  G [A-Z]{7}  /x
        qr/  S [A-Z]{7}  /x
        qr/   

    • > I would expect the three of them to match different values. The first one matches eight-letter uc words starting with 'G'. The second matches eight letter uc words starting with 'S'. The final regex matches eight letter uc words starting with any letter.

      Sorry for not being clear. I'd expect /S[A-Z]{7}/ to be a subset of the matches from /[A-Z]{8}/ but instead the latter isn't returning some of the results the former does.
      • That would match my expectation as well. Are you 100% positive that there is no other difference in the two regexen (or the two scripts)?

        If so, I'd be stumped too :/

        -matt
      • Glad to see /x modifier there. /S[A-Z]{7}/ should be a subset of /[A-Z]{7}/, in particular the subset /(?=S)[A-Z]{7}/. If it isn't, it could be a bug in the backtracking logic ... or an issue with binmode?

        If there is any possibility of accented 'national' characters (which there always is in unconstrained data) '\w' is much preferred to [A-Za-z] or [A-Z]/i.

        I'd worry that some 'persons' might actually be shorter than 8 chars, or have spaces or lower case in some systems. (van Helsing etc)

        What strings(1)
        --
        Bill
        # I had a sig when sigs were cool
        use Sig;