I had to edit perlfaq6's "I put a regular expression into $/ but it didn't work. What's wrong?" last night, so I did what I usually did: use a word from the question title to jump to the right place in the document. I figured the right word would be "wrong", but as I jumped from instance of "wrong" to "wrong", I thought there were an awfully lot of "wrong"s. I hadn't really thought about it before: what's the balance of "wrong" and "right" in the perlfaq? Who's winning?
#!/bin/sh
cd/Users/brian/Dev/perlfaq
echo " doc wrong right"
echo "----------------------------------"
for doc in perlfaq[123456789].pod;
do
wrong=`grep -c -i wrong $doc`
right=`grep -c -i right $doc`
printf '%-12s %8d %8d\n' $doc $wrong $right
done
Even without tallying the totals, I see the "right" wins out over "wrong", although perlfaq6, the doc I was editing, does have the highest number of "wrong"s.
doc wrong right
----------------------------------
perlfaq1.pod 0 4
perlfaq2.pod 0 4
perlfaq3.pod 1 8
perlfaq4.pod 4 12
perlfaq5.pod 5 3
perlfaq6.pod 6 6
perlfaq7.pod 4 11
perlfaq8.pod 2 5
perlfaq9.pod 1 3
Curiously, the distribution of "wrongs" is a bell curve, although not
quite symmetrical.
|
| *
| * *
| * * * *
| * * * *
| * * * * *
| * * * * * * *
0+------------------
1 2 3 4 5 6 7 8 9
This gives me a chance to play with R, a statisical package. I at first thought "R" must be a really bad name because it must be hard to find in Google, but it's the second result (the first is the stock quote for "R" (Ryder System Inc)). "R" is slick: I wish I had this when I was doing chemistry.
albook_brian[791]$ R
R : Copyright 2004, The R Foundation for Statistical Computing
Version 2.0.0 (2004-10-04), ISBN 3-900051-07-0
> freq <- c( 1,4,5,6,4,2,1 )
> mean(freq)
[1] 3.285714
> median(freq)
[1] 4
> var(freq)
[1] 3.904762
> sd(freq)
[1] 1.976047
Still, "wrong" might be right word even if it seemed to show up a lot. I modified my shell script to check the other words too, and on a second revision, check some words with their juxtaposed punctuation, thinking that combination would be even less frequent.
#!/bin/sh
cd/Users/brian/Dev/perlfaq
doc=perlfaq6.pod
echo " doc word count"
echo "-------------------------------------"
for word in "I" "put" "a" "regular" "expression" "into" "$/" \
"but" "it" "didn't" "work" "work." "What's" "wrong" "wrong?"
do
count=`grep -i -c $word $doc`
printf '%-15s %-15s %4d\n' $doc $word $count
done
If I wanted to jump right to the question, "didn't" is the word to choose, although "wrong?" gets me there in at most two hops. I shouldn't choose "I", "it", or "a". Their numbers are low because the -c switch only counts matching lines, remember. Curiously, "work" seems to always show up next to a full stop.
doc word count
-------------------------------------
perlfaq6.pod I 456
perlfaq6.pod put 7
perlfaq6.pod a 437
perlfaq6.pod regular 27
perlfaq6.pod expression 26
perlfaq6.pod into 6
perlfaq6.pod $/ 12
perlfaq6.pod but 20
perlfaq6.pod it 120
perlfaq6.pod didn't 1
perlfaq6.pod work 8
perlfaq6.pod work. 8
perlfaq6.pod What's 4
perlfaq6.pod wrong 6
perlfaq6.pod wrong? 2
"/Users" dir? (Score:1)
Just curious, do you use Gobo Linux (http://www.gobolinux.org/) or is there another reason for such a name?
--
Offer Kaye
Re:"/Users" dir? (Score:2)