NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Similar code... (Score:3, Insightful)
I use my own mail parser class that doesn't use memory (it uses temp files instead), and decodes all the MIME stuff for you. Might be worth checking out too in case anyone is interested.
We'll probably plug this into SA 2.41+ or SA3 (whichever comes first).
Re:Similar code... (Score:1)
By the way, a simple tokenizer tweak cut my falst negatives in half. I only force a token to lowercase if at least one character is already lowercase. This has the effect of keeping a separate (high) weights for "MILLION" and "EMAILS" than for "million" and "emails", which have l
Re:Similar code... (Score:2)
The upper/lower case thing didn't make one squat of a difference for me.
Reply to This
Parent
Re:Similar code... (Score:1)
When querying, are you going after tokens one at a time, batching up requests using IN (), or trying to get them all at once using a JOIN?
Re:Similar code... (Score:2)