Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • grepurl looks a lot like my urifind [cpan.org] .

    --
    (darren)
    • It is a lot like your urifind!

      I do not get enough time on the net to research things as much I would like. I figured there was someone who must have done this before.

      I had to sit at a desk for 12 hours doing pretty much nothing, so I made grepurl. Oh well, I had fun with it :)
      • For the first few weeks after I put this on CPAN, I kept waiting for the same message you just got, and it never came, which leads me to believe that there are now exactly two things that do this.

        We should start a club. :)

        urifind uses URI::Find, which, in my benchmarks, is slower that HTML::SimpleLinkExtor, and relies on regexes rather than document structure. Which means grepurl will most likely be faster and more accurate (urifind cannot do relative URIs, for example). Then again, I wrote urifind s

        --
        (darren)
        • I looked around for URI::Find for about a week, because I knew something like it existed, but I kept thinking it was called something like Text::Extract::*. I have a minicpan on my computer, but I don't have a good way to search it (like search.cpan.org---the cpan script assumes you have a pretty good idea to start).

          I've got urlfind now and i'm going to go back to my tent and look at it. one of the things i wanted to add was different data sources (HTML, text, xml, and so on), and i have a lot of time here recently. :)