With a lot of help (Robrt, clkao, svk - many thanks) the branch got merged back to trunk and is now released as Parrot 0.3.0. The official announce should make it to the front page soon.
I've now compared old calling conventions with the new one by checking bytecode size, executed amount of opcodes, and the timings of one of the benchmarks.
I've used the suduku solver and the oo5 benchmark. The latter does 1 million attribute accesses via a method call, comprising a typical example of OO code.
The numbers were achieved by running
and by $
Here are the results:
Sudoku codesize 5764 3442
Sudoku ops -p 911973 793967
oo5 ops -p 22501832 10503606
oo5 time -C 7.5s 4.9s
The new calling code reduces code size and executed opocdes significantly, which goes up to a factor of two for the benchmark. There is also a non-trivial speedup albeit no attempts were done yet to optimze the new code for speed.
Two GC bugs (one in my branch, one in both trunk and the branch) slowed down further progress in finishing the calling convention changes. But these bugs are fixed now, and Coke can finish hunting his own in partcl
In return for endless debug sessions, I compared oo5.imc and oo6.imc from examples/benchmarks - these (premature) results are promising: around half of the opcodes saved and more then 20% speedup. These two benchmarks test attribute accessor functions, where the call overhead is considerable.
A correction foremost
A remark WRT the paragraph mentioning type checking in my first ewaa journal. It was caused by me misinterpreting the type conversion specs in pdd03. We are now doing full type conversion, which makes argument passing strictly positional. That's very likely the expectation of our users anyway. In the old scheme calling a subroutine:
foo(pmc b, str a) # pseudosyntax to denote argument
just worked, despite the sub was defined as:
Looks weird, but was really used, probably more by accident then for some good reason. Now you get the first argument converted to a string and the second as a new string PMC.
Continuations and return context
As Dan describes in his blog, there is not much difference between a subroutine call and the invocation of a continuation. Both can return results to the caller. A nice example provided by Piers from Parrot's tests (yeah, number 13) contains something like:
x = choose([1, 3, 5])
y = choose([1, 5, 9])
if (x * y != 15)
Above is vastly simplified, because the choose function calls capture their continuations and call a try closure that either backtracks via the saved fail hook or delivers the next item from the passed in choices array. Anyway the interesting thing (for a proper implementation of continuations) is, that there are mupltiple continuations that like to return either the next x or y, when invoked indirectly via the fail function. These continuations are not only returning to different program locations they are also returning different results.
In the source we got this line:
our_cc($P3) # $P3 being one of the next choices
where eventually the next result for choose is to be returned to the main program. It looks like a plain subroutine call, but invokes the passed in continuation and ends up, where one of the choose calls had left off.
This means that creating a full continuation still needs to copy the context structure (which contains also the current return results location). A return continuation can just contain a pointer to the refered context.
After having implemented this all now, the rather complicated tests (p6rules, streams), which are using continuations as well as coroutines are passing within my branch (I didn't look at other failing tests yet, but just converting explicit register usage for call setup to new conventions should fix almost all). These tests are of course passing in Parrot svn trunk true, but now (for the most common case of invoking a return continuation) no context is copied at all, it's just a matter of putting the pointer of the captured context into the interpreter.
The official way to do call/cc is still unchanged, it's:
cc = interpinfo
This usually means that you need a helper subroutine, where you capture the continuation (and pass it along somewhere for later usage), it's OTOH a well defined usage of continuations in e.g. scheme. When things are settling, I'm sure that we can provide some shortcuts for capturing continuations inside the same sub (or context) too.
And implementing want or similar is of course simple now, as the get_results opcode is emitted before the subroutine invocation, and is therefore available for plain function returns as well as for continuations that return a result. Both can return what the caller wants, because the information is present in the context.
Well, the svn diff got bigger a bit and operation is now that far consolidated that I've committed it to a new branch branches/leo-ctx5, so that @all can investigate it and start helping to get it finished soon.
Well, 160 Kbytes are my diffs now against Parrot svn HEAD. And still no chance to check it in, without blocking $others, because too much still is failing.
r = foo(a, b)translates now to new call scheme opcodes, as well as
.returns()stuff. Just tossed 460 lines in imcc/pcc.c and added 20 new. Works fine.
Overall it looks like that calling PASM/PASM is by far simpler now, but PASM/C (NCI) and C/PASM (runops_fromc_*) interfaces use slightly more code lines. OTOH we get now strict type checking and conversions to and from PMCs, which wasn't in the old scheme. I can very well imagine that we don't want to call C code w/o argument verification anyway.
Wrote a second context allocation strategy today. Just malloc/free backed by a free list. Works fine, except that it seems to reveal a very likely unrelated GC bug, which I'm still chasing.
Yesterday was "Somehow Olympia" (German text) here in Herrnbaumgarten. We had a lot of fun for example watching "towel throwing for politicians".
The original design was not only rather heavyweight, it was also incomplete and a bit blurry. It was lacking support for MMD argument signatures, return value context, typechecking, and so on.
I'll try do describe some reasons, why we changed calling conventions to use a new abstract scheme. The major change is that all the argument passing is now done by dedicated opcodes, which allows any later implementation and adaption under the hood without changing the ABI of the Parrot VM.
A function call:
was translated to these argument-related opcodes (function lookup and call opcodes omitted for brevity):
set I0, 1 # prototyped call
set I1, 1 # 1 INT argument
set I2, 0 # 0 STR arguments
set I3, 1 # 1 PMC argument
set I4, 0 # 0 NUM arguments
set P5, Px # get PMC argument
set I5, Iy # get INT argument
set S0, "foo" # function name
A lot of opcodes to dispatch, but takes almost no time with the JIT runtime. Fine so far. Another function call:
produced exactly the same call register setup - these two function calls can not be discerned when, it comes to multi method dispatch.
Now the first argument setup just translates to:
set_args "(0b10,0)", Px, Iy
and the second is:
set_args "(0,0b10)", Ix, Py
Thus we not only saved 7 opcode dispatches per function call, we got a clear type information of the caller's arguments including the argument order. You don't have to set these type bits yourself, the assembler does it according to the passed arguments, so just writing:
set_args "(0,0)", Ix, Py
In the old scheme you could happily call a function with:
which was defined as:
set Nx, N5 # get 1st NUM param into n
set Sy, I5 # get 1st STR param into s
Due to the register usage of argument passing the function would have picked up whatever happens to be in registers N5 and S5 and would run - probably not long though. A possible "solution" would have been to force all compilers and Parrot hackers to emit code to first verify the passed arguments. That's of course another bunch of opcodes, bulky and error-prone. Now the function defines precisely what it awaits:
get_params "(0b11,0b1)", Nx, Sy
Again the type bits are filled in by the assembler. But during the call sequence, the argument passing code can verify the types (and counts) of arguments and parameters. Conversions to and from PMC parameters are specified and done automatically. Mismatches are reported by an exception.
Implicit register usage
The central mechanism of a function call in the old scheme was just the plain argumentless opcode:
It would pick up whatever happens to be in P1 and uses it as the continuation of the call. P2 was defined to be the invocant, if it's a method call. And so on - and it call's whatever is in P0. That's per se fine, if all code writers and compilers strictly use this convention and don't forget to NULLify registers that shouldn't be used for the call, but it's a major PITA for the assembler, which ought to track the control flow for proper register allocation: is the invoke a function call, a yield, a return from a function? Well it's not defined, it could be everything. Not a few lines inside imcc are trying to track down the usage of invoke opcodes to do the right thing. You can imagine that this does not contribute to clear code.
The old call scheme demanded that the invocant is passed out-of-band in P2. It's also only available in functions declared as methods by a special interpinfo call. This doesn't really match our major target languages, where the invocant just happens to be the first param of a method.
Calling a function as a method or vice versa would have needed to shift PMC arguments down or up to get everything into the registers that the callee expects.
Return value context
The old scheme had no provision for specifying, what and how many return results the caller expects. Now the get_results opcode is emitted before the actual function call, so that a function return has a chance to return what the caller wants.
Future and optimizationIn the old scheme the lower 16 registers of each kind were volatile (each function return could set these registers). This implies that you usually have to move registers from the preserved area into the lower half, during the call sequence registers are moved into the callee's lower half, from where another round through all parameters would have placed everything in the preserved area. This are three passes over all arguments - hardly to avoid in the general case.
The old call scheme reserved 4*16 registers just for function calls and returns. This accounts for 320 bytes (on a 32-bit machine) that have to be allocated per call to pass e.g. just one word argument to an one-liner function or an attribute accessor method.
Ok, we are not doing optimzation now - that's fine. But the old calling scheme would have prevented all future optimizations that will be needed. You can't do any optimizations later when the call scheme is carved in stone and just reserves half of the register resources for itself.