I've long used part of the Rakudo-building process as my benchmark for Parrot optimizations. It doesn't exercise all of Parrot by any means, but it pushes the garbage collector, the string system, and the object system pretty hard. It's also the longest part of the build process anywhere, so improving performance there has a dramatic effect on productivity.
I've long known that it appends to and concatenates strings very frequently. The corresponding Parrot functions always show up near the top of the cost list from Callgrind. I've spent a few hours here and there trying to speed them up, but short of rethinking them entirely, I've never found a good solution.
We don't (yet) have a PIR-level profiler like Callgrind (but I'm working on it). Sometimes I can read through the code and see things that might be expensive, knowing what eventually gets called and where. On a whim, I browsed through the generated code for Rakudo to find concatenation. One idiom jumped out at me:
$S0 = concat "PIR", ':"'
This line of code catenates two constant strings into one, and assigns the result to the virtual string register
$S0. It's not terribly inefficient; the two constant strings are in the constant table in the bytecode, so they don't get recreated on execution. However, the result is always going to be a constant string.
There's no reason the optimizer couldn't rewrite that line of code into:
$S0 = 'PIR:"'
It didn't. Now it does.
The optimizer already knew how to optimize away constant addition,
subtraction, and other math operations. All I had to do was tell the opcode
generator not to generate the
concat_s_sc_sc op (concatenate two
string constants and put the results in a string register -- the destination
register always comes first in the argument list for an opcode), add the
concat operator to the list of rewritable three-register opcodes,
and add a case to the rewriter where the destination register is a string
The result is a modest 5.5% speedup in my benchmark. Part of the gain is avoiding run-time concatenation, but part of it is also using fewer GCable entities. The less garbage you create, the less garbage you have to collect and the less frequently you have to collect it. 5.5% isn't a huge gain, but it's a step in the right direction. Every second we shave off of the hack-compile-test cycle is an improvement, and these optimizations help all Parrot programs as well.
(Are these entries interesting? I seem to get more feedback when I complain about various Technology Distortion Fields.)