One of the persistent bottlenecks in Parrot is that its naive, simple-to-describe stop-the-world mark-and-sweep garbage collector can be expensive. Even though it's mostly
O(2n), there's one characteristic of almost every program which makes n far larger than you might want it to be: most GCable elements have short-to-very-short lifespans. That's even true in Perl.
Cleverer GCs use tricks such as memory partitioning or generations to handle this problem. When you need more memory, you first try to reclaim the youngest GCable elements, on the theory that most of them have very short lifespans. Everything that survives that collection gets promoted to an older generation or copied to an older partition which you sweep less frequently. Mark and sweep, at least as currently implemented in Parrot, always walks the entire live set of GCable elements every time the GC runs to completion. Even if a PMC anchored in the root set has survived the last thousand GC runs, it still gets marked during every run.
We haven't fixed that yet. (Replacing a garbage collector in a running system not exactly designed for pluggability of garbage collectors is not an easy task.) Andrew Whitworth's proof of concept for Google's Summer of Code 2008 gave us a lot of good information to improve our GC, but that task is still ahead of us.
If you've programmed in C or C++, you're familiar with automatic variables. These get allocated on the stack --
int x; char y;. If you've ever returned a pointer to a stack-allocated value, you've had fun debugging the results. (Parrot r32065 fixed an amusing bug related to this in Parrot.)
Heap variables you allocate yourself with
malloc/calloc and must explicitly
free yourself. The advantage here is that you can pass pointers to them with impunity, no matter where you are in the call graph with regard to the stack location where you allocated them.
The memory characteristics of individual programs vary, but it's likely that a program uses many more stack variables than it does heap variables. (I use the terms in a metaphorical sense. What I really mean is "Variables that don't escape the current compilation unit versus variables that do", but I don't want to talk about escape analysis.)
I don't know of many garbage collectors which can distinguish between stack-style GCables and heap-style GCables -- again, not without doing clever tricks with escape analysis. This is one place where I do wish sometimes we had a tiny kernel with the VM bootstrapped atop; the semantics of C just don't let you express or analyze the memory characteristics of the program in any way that lends itself to automatic lifetime tracking.
It's also difficult to reason from the code in general about the expected lifetime of a GCable element. Every PMC or STRING returned from a C function has effectively escaped, and we can't predict what any future consumer of that API will do with the results.
Like many problems which are unsolveable in general, there are specific exceptions. Sometimes you can read the code and verify that you have what would be in C a stack variable -- even if it's a GCable entity.
I added a pair of functions in Parrot r32313 called
temporary_pmc_free. The documentation has quite a few caveats that I won't explain (figuring that they're scary enough that anyone who doesn't understand their implications should not use them). Basically, they're a way of creating a new PMC for the singular case where you know it's specific and entire lifespan.
I mention this to say that I teased a nice 5.14% speedup out of our multidispatch algorithm today by turning two PMCs into temporary PMCs. Parrot r32963 shows how easy the patch was to write, once I had identified the hotspot and verified that this was an appropriate optimization.
The ultimate solution is to improve our garbage collector so that it can recycle lots of young garbage very quickly, but I'll take a 5% optimization for now if I can spend two or three minutes coding and ten minutes reviewing profiling runs.