Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • This doesn't help with addresses, but I created the Lingua-EN-MatchNames module to eliminate duplicate user records between security and groupware databases a few years ago, and it worked quite well.
    • Thanks Brian,

      Yes, I looked at it, as well as the excellent modules Lingua:EN:NameParse and Lingua::EN::AddressParse by Kim Ryan. I plan to use them once I get to the blocking window level of matches.

      The problem, of course, is that when you have many millions of records, the turn around time for a really close look at each record just gets too large. So what I think I need to do is determine how to split the records for large datasets into groups that can be compared in an economical amount of time (the