Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

TorgoX (1933)

TorgoX
  sburkeNO@SPAMcpan.org
http://search.cpan.org/~sburke/

"Il est beau comme la retractilité des serres des oiseaux rapaces [...] et surtout, comme la rencontre fortuite sur une table de dissection d'une machine à coudre et d'un parapluie !" -- Lautréamont

Journal of TorgoX (1933)

Friday October 25, 2002
04:04 PM

Corpus Colossus

[ #8612 ]
Dear Log,

I needed a corpus of HTML files for testing the HTML::Formatter classes (which I'm tidying up). So I logged into a friend's .edu account, and with a few little Unix commands (including a one-liner involving the king of them all, Perl), I made a tar file of all the reasonably-sized ~user/public_html/index.html files on the system. All 2,908 of them. Excellent!

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Dear Log,

    On the mp3-trola: Grieg, "Cradle Song"

    So I'm trying my hand at writing an HTML::FormatRTF to go along with HTML::FormatPS and HTML::FormatText in the HTML-Format(ter?) dist. While writing pod2rtf was pretty straightforward, this is proving pretty tricky -- because the block-level content-models for HTML are so much trickier than the block-level content-models for Pod. If I could wave a magic wand and make all the HTML input be XHTML Strict, that'd be really handy. But for the moment, there's