Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • by saorge (6224) on 2005.10.31 4:01 (#44268)
    Perl is the first relevant word, because the others are "stop words". There some modules on the CPAN to work with these stop words. There are also a lot of modules to index text (even if your first intention isn't to really index text in the sense of a search engine). The occurences of term is often saved into the database because these value is used to compute the ranking of the document after a search (the most known of tese methods are TF.IDF). So, it could be simpler to query the database. perlindex is a script available on the CPAN that index the Perl documentation available on your hard disk. One option of these script ask the total number (-d threshold for the occurence) of occurences. On my box, head is the more often word used.
    • The joke doesn't work as well when "perl" is at the top of the list for both books.

      I don't think it's simpler to make a database. My script was only 10 lines, including blank ones :)
      • Dear brian,

        Sorry I went Pedantic on your joke along with saorge. The perl perl perl ... graf you threatened to add to the next book is funny. I just got caught up in search boffin blather on about stopwords ... a hot-button thing.

        - Bill

        --
        Bill
        # I had a sig when sigs were cool
        use Sig;