Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Saturday October 30, 2004
05:34 PM

The right and wrong of perlfaq

[ #21609 ]

I had to edit perlfaq6's "I put a regular expression into $/ but it didn't work. What's wrong?" last night, so I did what I usually did: use a word from the question title to jump to the right place in the document. I figured the right word would be "wrong", but as I jumped from instance of "wrong" to "wrong", I thought there were an awfully lot of "wrong"s. I hadn't really thought about it before: what's the balance of "wrong" and "right" in the perlfaq? Who's winning?

cd /Users/brian/Dev/perlfaq
echo "    doc           wrong    right"
echo "----------------------------------"
for doc in perlfaq[123456789].pod;
    wrong=`grep -c -i wrong $doc`
    right=`grep -c -i right $doc`
    printf '%-12s %8d %8d\n' $doc $wrong $right

Even without tallying the totals, I see the "right" wins out over "wrong", although perlfaq6, the doc I was editing, does have the highest number of "wrong"s.

    doc           wrong    right
perlfaq1.pod        0        4
perlfaq2.pod        0        4
perlfaq3.pod        1        8
perlfaq4.pod        4       12
perlfaq5.pod        5        3
perlfaq6.pod        6        6
perlfaq7.pod        4       11
perlfaq8.pod        2        5
perlfaq9.pod        1        3

Curiously, the distribution of "wrongs" is a bell curve, although not
quite symmetrical.

|           *
|         * *
|       * * * *
|       * * * *
|       * * * * *
|     * * * * * * *
  1 2 3 4 5 6 7 8 9

This gives me a chance to play with R, a statisical package. I at first thought "R" must be a really bad name because it must be hard to find in Google, but it's the second result (the first is the stock quote for "R" (Ryder System Inc)). "R" is slick: I wish I had this when I was doing chemistry.

albook_brian[791]$ R
R : Copyright 2004, The R Foundation for Statistical Computing
Version 2.0.0  (2004-10-04), ISBN 3-900051-07-0
> freq <- c( 1,4,5,6,4,2,1 )
> mean(freq)
[1] 3.285714
> median(freq)
[1] 4
> var(freq)
[1] 3.904762
> sd(freq)
[1] 1.976047

Still, "wrong" might be right word even if it seemed to show up a lot. I modified my shell script to check the other words too, and on a second revision, check some words with their juxtaposed punctuation, thinking that combination would be even less frequent.

cd /Users/brian/Dev/perlfaq
echo "     doc        word            count"
echo "-------------------------------------"
for word in "I" "put" "a" "regular" "expression" "into" "$/" \
    "but" "it" "didn't" "work" "work." "What's" "wrong" "wrong?"
    count=`grep -i -c $word $doc`
    printf '%-15s %-15s %4d\n' $doc $word $count

If I wanted to jump right to the question, "didn't" is the word to choose, although "wrong?" gets me there in at most two hops. I shouldn't choose "I", "it", or "a". Their numbers are low because the -c switch only counts matching lines, remember. Curiously, "work" seems to always show up next to a full stop.

     doc        word            count
perlfaq6.pod    I                456
perlfaq6.pod    put                7
perlfaq6.pod    a                437
perlfaq6.pod    regular           27
perlfaq6.pod    expression        26
perlfaq6.pod    into               6
perlfaq6.pod    $/                12
perlfaq6.pod    but               20
perlfaq6.pod    it               120
perlfaq6.pod    didn't             1
perlfaq6.pod    work               8
perlfaq6.pod    work.              8
perlfaq6.pod    What's             4
perlfaq6.pod    wrong              6
perlfaq6.pod    wrong?             2

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.