In Double-Speed, I gave numbers for two Parrot optimizations that added up to a big improvement. Several people asked for explanations.
The most dramatic change is a refinement on stack chunk recycling, which I first explained in Refcounting Isn't All Bad. I've been thinking a lot lately about the difference between things which happen often versus things which happen rarely. Even though continuations at the VM level mean that control flow is not always a perfect singly-linked list of call frames, control flow is usually a singly-linked list of call frames. Though your call state can be a tree, in practice taking a continuation is rare.
In Parrot, taking a continuation is the only thing that will increase the reference count of a stack chunk. (I assume by now that you either understand continuations enough to know what I mean, or you can look it up for yourself. The important point is that a continuation captures a point in the call graph and keeps it alive as long as the continuation is alive. This allows the continuation to return to that point in the call graph whenever you invoke it.)
When I realized this, I removed all of the other refcount manipulations from the system. The only remaining refcount manipulation is for creating and destroying continuations. That's why Parrot revision 27567 gave a 30% improvement in the Rakudo-building benchmark.
revision 27568 is more interesting. The final diff chunks, for the
isa vtable entry, tell the story. Given a PMC and the name of a
class, is the PMC an instance of the class or one of its ancestors?
To answer this question, the function must get the name of the PMC's current
class. That's the call to the
get_string vtable entry.
The second chunk of the diff shows this function. It performs some calculations and caches the name, so there's a lazy-evaluation semantic here. Then it returns a copy of the name. (Don't worry -- it's a copy-on-write copy. I disabled COW once in Parrot for fun and, well, if there's one optimization we're keeping, it's that one. Yikes.)
Strings are mutable in Parrot. If you wanted to subclass a class at runtime, you might grab the name of the parent class, append a counter or other mangling to its name, and use the result as the name of the subclass. If you modified the same string that the parent class used in its name cache, you'd effectively change the class's name in place. (I should mention that strings are in two pieces. There's a string header and a string body. Headers are all of uniform size. Bodies are appropriate to the length of the string. This makes GC much easier in many ways and also makes COW possible.)
However, if you don't actually modify the contents of the string, you've just used up a free string header that you can't reclaim until the next GC run.
isa doesn't modify the contents of the string. It uses it only
in a string comparison. There are no problems there. (Okay, that's a fib; the
string comparison might change the encoding of the class name in place, but
that's a safe operation, and actually a benefit -- lazy evaluation, again.)
By extracting the name-calculation operation out of
make_class_name, that vtable entry became a call to the
extracted function and then a copy on its results. The semantics stayed the
isa vtable entry now calls
directly, avoiding the copy. Because
isa is a hotspot, this
reduces the amount of immediately recyclable garbage dramatically.
There is one twist, however. The PMCProxy PMC inherits its
vtable entry from the Class PMC, but stores its name in a different manner.
Thus the code needs to fall back on the
get_string call if the
SELF PMC is a PMCProxy. This is slightly ugly, and a better
solution is to give PMCProxy its own
isa entry (or fill it out
more, as it exists in part).
Though I'm no big fan of C's memory model, it does have one useful semantic in the distinction between stack memory and heap memory. The implementation of that semantic leads to plenty of bugs, but it is useful to be able to mark the expected lifetime of a piece of memory. If we could do this more reliably in Parrot, we could ease even more GC pressure. There are correctness, conciseness, and ease-of-use penalties in that idea however.