Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

cog (4665)

Journal of cog (4665)

Friday October 15, 2004
05:08 AM

Got Corpus?

[ #21352 ]

I'm looking for reasonable quantities of text in as many languages as I can get my hands on (note: I mean "text in English", "text in French", etc. I do not mean "text with as many languages as possible inside it").

Basically, I'm looking for better training text for my Lingua::Identify project.

If anyone has a couple of pointers (or even the corpus by itself, even if just of one language), I'd really appreciate that :-)

Oh, one other thing: by "reasonable", I think I'm aiming for something like 10M... but I'd just like to get my hands on corpus, right now (hey, 1M today, 1M tomorrow...)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.