Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
    • How, pray tell, is this particular regex going to backtrack needlessly? Or are you just cargo culting the “death to dot star” line?

      (That said, the .* in Robrt’s pattern is superfluous, as the pattern will match the exact same things with or without it.)

      • Yes, I was invoking it in the "Cargo Cult" context. The .* at the end was superfluous. He could have put them at both ends: /.*\._.*/. So, yes, I was cargo culting it. I was trying to draw attention to the fact that .* is almost never what you want to use.
        • “Death to dot star” is about backtracking. At the end of the pattern, the .* won’t backtrack. If you put another one at the front, though, it will. What is your point?

          You would have made your case much better if you just said “the .* there is a noop” instead of throwing in something entirely unrelated that happens to be about dot star.

          I was trying to draw attention to the fact that .* is almost never what you want to use.

          But you’re wrong. It is exactly what you wa

  • Let me guess: it has to do with removing those resource fork files that you get on OS X in some circumstances?

  • It’s not Perl that’s line noise, it’s the regex syntax. Last night I wrote this: s{/(?!\.\.)[^/]+/+\.\.(?=/|\z)}{}g

    • I love the x modifier. :-)

      s{
          /           # Leading slash.
          (?!\.\.)    # No parent directories of root.
          [^/]+       # Pick out the directory name in root.
          /+\.\.      # Match any number of slashes, then parent directory.
          (?=/|\z)    # Assert that there is a following slash (or end of string).
      }{}gx

      So it cleans up pathnames to remove /foo/../bar/../baz to be ju

      • Even /x only really helps because of your profuse comments, though.

        You guessed correctly: I needed to normalize HTTP URIs to compare them, and URI [cpan.org]’s canonical method doesn’t finish the job. To be precise, this runs in a 1 while s/// loop, which is necessary to handle paths like foo/bar/baz/quux/../../../bar.

        • It's not just the comments. I also find that breaking the regex into its component groups makes it easier to understand. It's what I do mentally anyway, so it just means that it's done already for me.

          -Dom

      • Hmm, actually, that has a bug. The negative look-ahead must contain a trailing slash, otherwise the pattern will erroneously fail to match something like foo/..fooledya/../bar.

        At first I thought I needed a more complex assertion than just include a trailing slash in there, so I started rewriting the regex extensively, and after I realised that it’s not that complex, I noticed that my comments actually have a noticably different focus from yours, so I decided to keep the result for comparison:

        s{