Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • You could use Ted Pedersen's Wordnet::Similarity modules. This attaches a numerical value to any two words and can help you identify which words are related, and how closely. I prefer jcn (the Jiang Conrath method) myself, but there are 10 different techniques on offer.

    Also, would it not make sense to use a POS (Part of Speech) tagger before you break down stop words and so on ? I can't recommend a Perl based POS tagger offhand, since most of my work in this area is done in Java... but I'm pretty sure they must exist. Do a POS tag (to find out noun, verb, adjective contexts for individual sentences), then do what you're doing now. This way, you would get both the word + the part of speech tag. For example, like has at least seven different contexts in which it may be used... which range from verb to adjective. fling could be either a noun or a verb.. and so on. Might give you a bit more granularity to work with..