Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Monday October 31, 2005
12:18 AM

Llama's frequency of "perl"

[ #27378 ]

According to my word count program, "perl" is only the 17th most frequent word in the Llama, 4th Edition.

            the 6494
              a 2862
             to 2710
             of 2235
           that 1689
             in 1658
            you 1571
             is 1518
            and 1394
             it  956
            for  926
             if  917
           this  832
             as  791
             be  706
             or  671
           perl  660

We're not doing much better in the re-write of the Alpaca, where "perl" has slipped to 21st. We still have time to change that, but from the looks of it I'll have to include a couple of paragraphs of just "perl perl perl ...".

            the 4977
             to 1952
              a 1917
             of 1340
            you 1331
             in 1083
            and 1081
             is 1048
           that  952
            for  741
             as  636
             it  608
           this  605
            can  543
             if  532
           with  444
             be  442
             an  383
             or  351
           your  351
           perl  321

I could have written something to go through all of our magazine columns, but then I'd have to use a module or something.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Perl is the first relevant word, because the others are "stop words". There some modules on the CPAN to work with these stop words. There are also a lot of modules to index text (even if your first intention isn't to really index text in the sense of a search engine). The occurences of term is often saved into the database because these value is used to compute the ranking of the document after a search (the most known of tese methods are TF.IDF). So, it could be simpler to query the database. perlindex is
    • The joke doesn't work as well when "perl" is at the top of the list for both books.

      I don't think it's simpler to make a database. My script was only 10 lines, including blank ones :)
      • Dear brian,

        Sorry I went Pedantic on your joke along with saorge. The perl perl perl ... graf you threatened to add to the next book is funny. I just got caught up in search boffin blather on about stopwords ... a hot-button thing.

        - Bill

        --
        Bill
        # I had a sig when sigs were cool
        use Sig;
  • Right on. Perhaps even stronger, Perl is the first substantive word, relevant or otherwise. But stopwords is the correct searchwonk jargon for this. I susspect philologists have their own term, Larry probably would have a word that means "not a helper verb, particle, article, or pronoun". Perl is the first substantive noun, proper or common, on both lists.

    Be happy!

    -- Bill
    former search boffin

    --
    Bill
    # I had a sig when sigs were cool
    use Sig;