Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Perhaps you're looking at only one aspect of how a module like this may be used. Yes, it can be used for detecting plaigarism, should the user choose to do so. But it can also be used as a similarity detection metric; which has uses far beyond seeing if journalists borrowed copy or if students cribbed essays.

    Related articles ? contextual matching ? I can think of a few more uses for this type of module. I'd actually like to see how you do it, out of academic interest.

    • Good idea. Text::Related would be one possibility. This would be perfect for an open-source google news. I'd love to use the code, if you ever decide to release it.
      --

      -DA [coder.com]

    • Because of the way the code is designed, I seriously doubt that it could be used for related articles or contextual matching. It's slow, but that's because of the algorithm I chose (which turned out to be surprisingly faster than some of the other options I was looking at.) It does a sentence by sentence comparison to determine "how far apart" two sentences are in terms of insertions, deletions and replacement. If they're close enough (under the user defined threshold), then a match is reported. It's th