A lot of people complain about reference counting as a memory management strategy in Perl 5, and they have a point. Unless you're very careful, reference counting spreads its tendrils throughout your system. Unless you're very disciplined, you'll forget to increase a reference count somewhere, and you'll end up collecting things in the wrong places.
The paper A Unified Theory of Garbage Collection discusses the most popular modern GC strategies to demonstrate how reference counting and tracing garbage collection are roughly equivalent, especially as concerns about performance, correctness, and conservativeness increase.
I explain this because I added reference counting to a small part of Parrot today as an optimization -- 11.73% on my Rakudo-building benchmark.
Parrot has an internal data structure called a stack. This represents a particular environment within the call graph. If you think of a normal program flow, where function A calls function B calls function C which returns to B and returns to A, you'll have a sense of a normal call stack. Parrot's stack is similar, except that we use continuation passing style (CPS), where it's not a stack, it's a graph, and control flow doesn't have to be that linear.
There's plenty of literature on CPS (including Luke Palmer's how
to get a free sandwich with continuations, but the important idea is that
function A sets a bookmark inside and passes that bookmark along to function B.
B does the same thing when it calls C. At any point, you can look up any
bookmark you happen to have and magically teleport (okay, it's just a jump
instruction in your processor -- someday I'll explain how very low level
eval and very high level languages have
eval and it's just the poor jerks in the middle who can't do
anything useful without feeding the world's pointiest programming language into
their AbstractFactoryFactoryFactories) to the point of that bookmark.
Ranty story short, our stack is a mostly-acyclic graph represented as a linked list, and stack entries -- or chunks -- are garbage collectable entities.
If you've even sat next to one of my "Let's Make Parrot Go Faster!" entries on a bus somewhere, or heard it playing softly in an elevator, you know that one good way to make Parrot go faster is to use fewer garbage collectable entities. One way to make an O(2n) algorithm faster is to make it an O(log n) algorithm. Another way is to make n smaller, and that's easier for now, especially because all I have to do is watch over Andrew Whitworth who's going to write a nice new garbage collector and keep track of Senaka Fernando, who's going to connect the nice garbage collector from Apache Harmony into Parrot.
One nice feature of reference counting (ah, now you see the connection!) is that you can recycle dead objects as soon as you know they're dead. One sad misfeature of a conservative garbage collector is that dead bodies stack up for a while until you can mulch them. One horrible thing about a stop-the-world full mark-and-sweep garbage collector (and try to guess what Parrot has at the moment) is that if you're out of memory in one of the arenas you care about, you have to mark all of the living GCable entities in your system and sweep all of the pools in all of the arenas to find everything that's alive and to recycle everything that isn't.
If you want to go fast, you want to avoid this until it's absolutely necessary.
Here's the interesting thing about stack chunks: Parrot stores them in one
place in one function (
stack_push) and unstores them in one place
in one function (
stack_pop). Again, the names are really bad
because it's a graph and not a stack, but you get the point.
I noticed this when I profiled the Rakudo-building benchmark again and saw
stack_push was one of the most expensive inclusive calls,
apart from the garbage collector. It calls a function which calls a function
which calls a function which asks for a new stack chunk from the GC. When
there are no more free stack chunks, the GC runs again, and that eats up
After a few minutes of thinking, I realized that the
stack_pop pair formed a boundary around a
stack chunk's lifespan. Popping a chunk off of the stack meant that nothing
else needed it. That chunk is immediately identifiably a dead object, suitable
Rather than waiting for a full GC run to find every live object in the system and then run across this dead object and recycle it, I added a couple of lines of code to recycle it automatically then and there.
There was my immediate 11.73% speedup.
There was also a problem in the Continuation PMC tests; I noticed this only after I patted myself on the back for moving the bottleneck in my benchmark elsewhere. I'm sure you'll see it if you think for a moment, especially about how I kept writing that the stack isn't a stack, it's a graph implemented as a linked list.
The problem is that there's not a single linked list. If you take a continuation at various points in the call graph, you can have several linked lists that all share the same tail. If you exhaust one of those call chains, you may pop a stack chunk that's part of another call graph elsewhere. Your reference to it goes away, but it's still live elsewhere, so if you recycle it then and there, you may end up reusing the chunk and storing inappropriate data in it while something else expects it to stay pristine.
If you're very clever, you've figured out the solution.
I added a reference count to the stack chunk structure, incremented it on
stack_push, decremented it on
stack_pop, and recycled
the chunk in the latter function only if the refcount is zero.
Is this a long-term solution? Probably not; I posted my initial investigation to the parrot-porters list, and we had a nice discussion on alternate techniques. We'll have to refactor the code more heavily once we find a better approach, but for now, I'll live with this optimization.
(Seneca Cunningham found what appears to be a crash related to this, but I think I've found the culprit and fixed it by adding another reference count pair to the Continuation PMC. See? I told you reference counting made itself pervasive.)