Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

chromatic (983)

  (email not shown publicly)

Blog Information [] Profile for chr0matic []

Journal of chromatic (983)

Monday October 27, 2008
10:37 PM

Multi-Primitive Dispatch Autoboxing Obsession

[ #37753 ]

When is an integer not an integer? When it's an int!

Parrot has four types of primitives: integers, floats, strings, and PMCs. Parrot also has four types of registers: I, N (for Numeric), S, and P. (Parrot would have an Artisan personality if we used F for float, but that's a bad pun for psychology students.) Many people have explained PMCs elsewhere, but think of them as "everything else that isn't just an integer, float, or a string." That's not entirely true, for reasons that will become obvious in a few paragraphs, but computer programming, unlike computer science, is all about finding the most appropriate lies to tell.

Parrot's calling conventions govern how you pass parameters to and receive values from Parrot functions. In particular, they dictate which registers you use. For example, you may pass two integers into a division function and get a float back. The two incoming registers may be I0 and I1, and the outgoing register may be N0 (I don't believe they are, but it's all documented, and if you write at the level of PIR or a higher-level language, you don't have to worry about that). That's all well and good for single dispatch.

Multi dispatch makes life much more complicated, and programming much easier.

Suppose you have a function called divide, which takes two arguments and returns one value. If you're used to Perl 5, you know that the division operator is already polymorphic; you can give it two SVs and get back one SV and treat the arguments as numeric or integral or however they are. Imagine that you have to worry about the type of the argument, however. If that mattered, you'd have to check the argument types as they came in and perform different kinds of comparisons based on the results of that check. If you've written much Perl, you've written this kind of code. That's one drawback of a single dispatch system.

In a multi dispatch system, you can define multiple functions with the same name but different signatures. (If that sounds like signature-based function overloading in Java or C++, it should -- but static manifest typing means that the compiler does its best to resolve those dispatches at the point of the call at compile time. If it can't do that, C++ will give you horrible error messages and Java will vomit stack traces. At least, I recall that Java has a weak stomach full of stack traces and loves to read while someone else drives it fast down narrow mountain roads full of hairpin turns.) At runtime, the types of the parameters passed to the call determine which variant function to call.

That is, you can define divide :multi(Integer, _) and divide :multi(Float, _), and Parrot's MMD system will call the first one when you write divide( 10, 2 ) and the second when you write divide( 5.0, 2.0 ). (You may ask "What about dispatching based on the expected return value?", but that's a much bigger flophouse full of monkeys.)

I mean, you can say that in Parrot now, as of a few minutes ago.

I mentioned primitives before. In PIR (Parrot's native language), the numbers I used as parameters before are actually integer and float constants. Parrot's PIR compiler stores them in I and N registers, respectively. I haven't yet explained that the type signatures on the PIR functions are tuples, nor that uppercase names represent PMC types and the underscore is the Prolog/Haskell placeholder, which means any type will do.

If you've read closely, you have realized that the arguments get passed in completely different registers than Parrot expects. Fortunately, Parrot's calling conventions account for this. We have PMC types which represent the other primitives: Integer, Float, and String. It's easy to convert from an integer primitive into an Integer PMC and so on. (The opposite isn't as easy, but that's again a different story.) This process is autoboxing, and Parrot's calling conventions handle it already automatically -- at least they do, in the single dispatch case.

The multiple dispatch case is more complex. It's not as easy as looking up a symbol in a namespace, checking to see if it's invokable, stuffing the parameters into the proper registers, and calling it. In between the "Alright, time for parameter passing!" and "Go ahead and call it!" steps, something has to say "Wow, there are a lot of variants! Which one is best, if any?"

There are several algorithms for finding the best match between multi variants and the passed parameters. Parrot uses a Manhattan distance, which calculates the number of steps between the actual parameter and the expected parameter for each possible parameter and each possible variant. How far is it to the store from my house? It's one block north, one block east, one block north, eight blocks east, and one block north. (As usual, this is a tiny lie, but pedagogical lies are acceptable in computer programming.) The multi variant tuple with the smallest distance for the given parameter tuple is the best fit.

Autoboxing isn't a problem for Parrot's calling conventions, but Parrot's Manhattan distance algorithm knew nothing about it until Parrot r32211. As far as it cared, the distance between an I register and a PMC register containing an Integer was way too far ever to think about walking. My change refines the algorithm to say "If the parameter is a primitive type -- integer, float, or string -- and if the expected argument is the corresponding PMC type, add one point of distance to account for the conversion, then continue looking at the rest of the tuple."

For as simple as that code looks, it almost had a bug where it preferred to autobox primitives to meet a variant such as foo :multi( Integer ) even when a variant foo :multi( int ) existed. This depended on the order of evaluation of compilation. I had forgotten to increase the distance of this parameter to account for autoboxing. Any variant which accepts the primitive as it is should take priority over one which requires autoboxing.

As with many of these patches I've explained, writing the explanation took longer than writing the patch -- even with that bugfix.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.