Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Whiteknight (8626)

Whiteknight
  (email not shown publicly)
http://en.wikibo ... User:Whiteknight
AOL IM: wknight8111 (Add Buddy, Send Message)

My name's Andrew. I'm an open-content advocate at http://en.wikibooks.org. I'm also involved with Parrot as a semi-competent C hacker. This blog is going to be a forum where I can ramble about minutia and post information about perl-world stuff that I care about.

Journal of Whiteknight (8626)

Tuesday January 06, 2009
02:50 PM

MMD distance calculation

[ #38224 ]

I had an idea last night about how to redo the MMD distance calculations. It's easy to say that I was...affected by chromatic's post to the list awhile back about the inefficencies of the MMD system.

The current system loops over the function signature string to create a type tuple array PMC. This tuple is used with the list of possible candidates to calculate manhattan distances. Once all the distances are calculated, the list is sorted according to this distance and the "best" match (with the smallest distance) is returned.

I had an idea that using a Levinshtein distance on the signature string directly would prevent us the need to create tuples and then calculate a manhattan distance. Actualy, we could probably use a simple Hamming Distance for most cases, since we don't want to match transposed arguments or cases of incorrect argument numbers.

If I have some free tuits, I might throw together a prototype subclass of the MultiSub PMC to prove the idea works (and hopefully prove that it's a performance gain).

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Why not use a low watermark algorithm to get the "best" match? Even if you need to keep track of all methods with the current "best" distance to break ties in the end, it seems sorting is unnecessary.

    Having never implemented a MMD myself, I am kind of curious how a string edit distance (Levenshtein or Hamming) could be used as a replacement to Manhattan distance. Could you provide a simple example?

    • The Parrot MMD system used to sort the candidate list, but apparently (according to chromatic++) we don't anymore.

      As to the ability of a Hamming distance to perform this duty, it's just a thought that I had. It may turn out to be untenable, but it's something I would like to try. As far as I understand the type tuple generation code, the tuple only contains the information that the signature string contains, with the addition of the PMC type.

      For simple situations, we can take a string like "SS->I", and c

      --
      Andrew Whitworth