I've spent the past three days profiling Parrot and Rakudo startup times. Christoph Otto and Vasily Chekalkin did some great work on a roadmap item I added a while back -- specifically removing thousands of exported symbols from our shared library. (The more symbols you export, the longer dynamic linking takes.)
As every successful optimization changes the performance profile of your application, I've found some interesting bottlenecks. Some of them are even amusing, in the forehead-slapping "You're such a geek if you find this funny" way.
For example, my most recent trace showed that calculating the correct method resolution order in Rakudo classes created a lot of PMCs. In particular, a section of the algorithm removes an entry from an array by index. (In Parrot vtable terms, this is a
delete_keyed_int operation.) Every instance of this PMC is a
ResizablePMCArray, roughly akin to your standard Perl 5 array. For some reason, RPA had no specific implementation of the
delete_keyed_int, which takes a primitive integer value and removes the PMC at that index in the array.
Instead, the RPA fell back to the default implementation of that operation. It takes the primitive integer and constructs a PMC
Key from it in a boxing operation. Then it performs the original PMC's
delete_keyed operation, which takes a
Key PMC and removes the PMC identified by that key in the array.
RPA defines that operation. The first thing it does is to extract a primitive integer value from the
I added a local
delete_keyed_int definition and rewrite
delete_keyed in terms of the latter. Rakudo now starts up 1.34% faster and allocates 9.38% fewer PMCs -- and all of those PMCs are PMCs with very short lifespans that would never survive the first garbage collection run. Avoiding even that is a performance improvement.
I estimate that Rakudo starts up nearly 40% faster now than it did when I started on Sunday night. We can get it faster yet.