Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • If you need any texts of any size check out Project Gutenberg.
    • Any specific ones you'd recommend there? I wouldn't be against using a group of texts -- the process takes about two minutes with a 1.3 meg text file, and I want this to be really good, so I'd be willing to stick a couple hours to it. I'd like the stats to be much more precise than normal, so I'd be all for processing ten or twelve texts. It's just that the ones I happen to have on my drive at home (forgive me) aren't normal. Perl docs, while reasonably grammatic and spell-correct and all, aren't normal
      --

      ------------------------------
      You are what you think.
      • Funny you should ask. About a month ago on the Perl Quiz of the Week mailing list, one of the quizzes concerned repeated substrings. One of the folks used the following text (extracted from an email):
        -----------------------
        'The Life and Opinions of Tristram Shandy, Gentleman' by Laurence Sterne, which when downloaded weighs in at around 1 Mo (as compared with 27 Ko for Dan Schmidt's US constitution).
        -----------------------

        The location of these (from another email):
        http://www.dfan.org/constitution.txt
        http://www.ibiblio.org/gutenberg/ etext97/shndy10.txt

        Enjoy. However, for frequency analysis just about anything might work on Project Gutenberg. Well, maybe the chromosomes might not.