Had a good talk with chromatic today after #parrotsketch, and I think I have enough information to get started with the garbage collector now. I've been hesitant to lay any code because there are so many options and questions and unknowns about the design, and I didn't want to waste time moving in the wrong direction. Now, I think, I know what direction to start moving in. I'm going to flesh out a few ideas in this post, so forgive me the length.
The current GC system gives each object a flag field that can be one of three colors: White (not checked, probably dead), Grey (Checked, but it's children aren't checked) and Black (Checked, all children are checked). To perform a GC run, all the memory items start out as white, except for a special "Root" set of items, which start as grey. At each step, we turn the children of the nodes to grey, and turn the nodes themselves to black. We continue in this way, following pointers, until There are no more grey nodes, and hence there are no unchecked children. All the nodes that are black are to be kept, and all the nodes that are still white can be freed. This is a pretty standard GC scheme.
However, we run into performance problems because each node contains a flag, and we must travel to each individual node and manipulate each individual flag in order to mark them as white. Then, when we want to re-run the collector, we need to traverse the list again, setting all the "black" nodes to "white" again. This is very inefficient.
Before talking any further, let me introduce the idea of a node "header". A header is a linked list node with a pointer to a data object. Headers are used for a number of reasons: To allow data objects to be moved in memory, to enable Copy-on-write and to help the GC keep track of items as they are marked and traversed. When we create a new object, like a new PMC, we create a header node for it.
Chromatic suggested the idea of using a card-marking scheme where we keep the flags in an array separate from the nodes themselves. Instead of marking a flag on the node itself, we mark bits in a bitmap in this array. With three possible "colors" for each node, that's two bits per flag, and 4 flags per byte and 16 flags in an ordinary 32-bit integer. We know the "order" of our memory chunks because we know the order that the headers appear in. The first flag corresponds to the first header in the list, the second to the second, and so on. This scheme is all well and good until you consider that one of the nodes in the middle of the bitmap is dead, and the rest are not. This is going to involve a complex operation where we have to "remove" one two-bit flag from the middle of an integer, and shift the remainder forward. This could be, potentially, a huge number of bit-wise logic and shift operations, unless I get creative in the way they are arranged. Even though these instructions are arithmetic and therefore fast, stringing together enough of them will cause a noticable slowdown. Using a linked list for the bitmaps instead of an array would help with the insertions and deletions, using a secondary bitmap to mark which bits in the primary bitmap are "active" is another interesting possibility. Either way, we would be required, basically, to play conservatively. If a certain bitmap contained more then a certain X% black objects, we would blindly declare all nodes in the bitmap to be good, even if a couple of them aren't. Combined with a generational scheme (where the older nodes appear at the front of the list), we call this the "dense prefix", and ignore it. The small amount of memory we can reclaim by killing white objects in the dense prefix is outweighed by the performance penalty to find them and remove them from the bitmap. This is an interesting scheme, and if I can work out some of the details, it could turn out to be very efficient.
I had an idea where we scrap the flags entirely, and instead create three linked lists: a white, a grey, and a black. Instead of checking and manipulating flags, we determine object status by checking the location of that object's header. If a header is in the black list, that object is black, etc. Before we start a new GC run, instead of having to mark each item individually as "white" to start, we would simply move the entire white list to the black list (2 pointer updates). Headers form a linked list, and if we want to allow the headers to be movable, they need to be doubly-linked lists (each node has a forward and a backward link). Moving headers from one list to another is where we incur our performance penalties: Worst case scenario, a move would require 6 pointer updates, only 5 if we always appended to the end of the target list. However, if we followed the idea of using a dense prefix, we could move nodes in groups, and then only need to update the pointers at the ends of the group, not the pointers on each individual node. We could do this by identifying runs of nodes with the same status, or by picking a certain number of nodes in a row Y and moving the whole group if a certain percentage of them are still black. Without flags, of course, it's going to be difficult to determine what the status of the object is. Plus, when we traverse the objects to determine status, we must do it through the object pointer tree, and not through the header list. This means it will be very very difficult to identify runs or groups of consecutive header nodes without relying on some kind of implicit scheme using complex recursion. In the end, it might be completely impractical to do anything in groups, and we would be stuck working with one node at a time (which, as we have seen, can be expensive).
These are just two schemes that we talked about today. Of the two, I think chromatic's is the best by virtue of having the fewest potentially-expensive operations. Of course, even that is going to require me to get creative to avoid some of the worst-case pitfalls. I'm going to try my hand at implementing both schemes, I think, and if I manage to create both successfully, benchmark them against each other to find a winner. Likely, I'll come up with some kind of hybrid approach (card marking with different header lists to mark generations) that takes ideas from both for a maximally efficient algorithm.
I'm going to lay out some basic data structures tonight and tomorrow, and send them to chromatic for approval. With that, I am going to build my initial allocators and deallocators (which, it turns out, are spread all over creation). i'm scheduled to have those done by friday, I'm hoping to get them set up sooner then that.