This is going to be my last post to this use.perl journal. From now on, all my Parrot- and Perl-related posting will happen on my Blogger blog instead. There are a number of reasons for this:
1) My wife and I are going to do all our personal family-related blogging on a dedicated blog, so my current personal blog is able to be completely devoted to technical topics.
2) I maintain 4 other blogs at Blogger on various topics, so it makes more sense to keep everything together.
3) The interface there is a lot better then the interface here, and I'm pretty sick of having to write out HTML tags by hand anymore.
4) Since I'm going to the blogger website daily anyway, I'm more likely to remember to write regular posts then I am here.
All my Parrot- and Perl-related blogs will be labeled with "Parrot" or "Perl", so people interested in subscribing to the crap I have to talk about can filter out the unrelated stuff. So, without further adieu, I bid use.perl goodbye forever.
The JIT system on Parrot is a little bit of a mess, and one that has been mostly ignored for a while now, except for the increasingly-frequent occasions when it breaks.
So far back as 2004 people in Parrot were talking about libJIT but decided it made more sense to use a custom-made JIT backend solution that was targeted to the idiosyncracies of Parrot "from day 1". Several years later, with the release of 1.0.0 slowly fading into the distance behind us, our custom-built system is not only no better then libJIT or LLVM for our specific needs, and is more likely to need to be completely scrapped and rewritten for it to be useful at all. Plus, JIT is only implemented on i386, which is only a small portion of our target systems.
Parrot's JIT system is a complicated one. Parrot hands over the current bytecode stream to the JIT engine, which in turn compiles the bytecode into native machine code and executes it. It does this by maintaining a large list of Regex-based definitions for opcodes, and writing the glue code necessary to pass between them and the other parts of Parrot (the PMC system vtables, for instance). This system requires that we write, from scratch and with minimal abstractable overlap, a new JIT engine for each system we want to support.
An engine like LLVM or libJIT (or even GNU Lightning, although I don't know as much about that project immediately simplifies everything. We can write the solution once, and have it just work on all platforms where the JIT backend system is supported (which, at the moment, is almost a superset of all the platforms that Parrot supports, for both engines). Plus, for both engines, we suddenly get all the benefits while only having to write one interface: automatic machine code generation and execution, cross-platform stability, and code optimization. It's like a win-win for us!
Tewk++ submitted a very interesting application to GSOC to implement an LLVM-based JIT system for Parrot. I don't know when winning projects get announced, but if he's in that group I'm sure I'll post a short hallelujah here. I've also been doing some looking into libJIT myself, and might dabble with that concurrently. Having a good JIT engine in Parrot would be very good, but having a sane interface that would support multiple JIT cores would be great.
In the meantime, my opinion is that our current JIT system should be ripped out wholesale. We don't really have a working system currently, just the illusion of one. It doesn't work on all our target systems, and where it does work it is fragile and messy and unreliable. It's better to admit to ourselves that we don't currently have an acceptable JIT solution. Hopefully, that realization will galvanize us into implementing a real JIT post haste.
A few miscellaneous updates:
1) Matrixy now has an assortment of TAP-related testing functions: plan(), ok(), nok(), is(). We also have functions start_todo() and end_todo() to mark blocks of tests as TODO tests. This morning I went through the test suite and updated most of our tests to use these routines. It isn't a perfect TAP implementation, and not all the expected functionality exists, but it's a start.
Blair has been doing more great work on the various library functions, and I'm much closer now to getting BLAS and LAPACK working like normal on my system (He's on Win32, I'm on Ubuntu-x86_64, so it's been hard to get all the NCI stuff properly working on my system)
Our test suite is really growing by leaps and bounds, which helps to put into perspective how much work we have done (a lot) and how much we still need to do (also a lot). It's a testament to Parrot and PCT that this project is progressing so quickly.
2) The Perl 6 Wikibook is still developing, albeit slower then it had been. Matrixy has been taking up a lot of my attention recently, and I am trying to strike while the iron is hot. I've been adding snippets of material here and there. I'm planning to get a fresh Rakudo release soon to facilitate some testing about concepts that I am not too familiar with yet. I'm hoping to get a good amount of work done this week, and I would appreciate any feedback that people have.
Continuing with my series about M, I'm going to talk today about defining functions today. I discussed how to call and use functions last time, and today we are going to learn how to make our own.
Functions are defined simply with the "function" keyword, and are terminated with the "endfunction" keyword. Here's an example:
It's worth noting here that semicolons should probably be used to terminate all statements inside a function, because the values of statements will still be printed to the console otherwise.
Parameters can be defined as expected:
function foo(a, b, c)
printf("values: %d, %d, %d", a, b, c);
Where things get a little different from what we (Perl programmers mostly) are used to is in return values. The return values of a function are defined in the function signature itself. The "return" keyword, while still useful for exiting a function prematurely, does not take a value itself.
function s = sum(a, b)
s = a + b;
Here is almost the exact definition of the pi() function in the Matrixy repo:
function p = pi()
p = 3.141592653589
M allows multiple return values too:
function [a, b, c] = bar()
a = 1;
b = 2;
c = 3;
All input and output parameters in M are optional. Even if we define 3 named parameters, a caller could pass only one or two (or 4 or more!) without causing a problem. It's the function's job to count the number of values that it has received and modify it's behavior accordingly. This is done simply through the "nargin" and "nargout" keywords:
function [a, b] = baz(c, d)
if nargin == 1
d = 0
elsif nargin > 2
error("too many args passed!");
if nargout >= 1
a = c + d;
elsif nargout >= 2
b = c - d;
error("too many output args!");
There's no such thing as multidispatch in M, if you want a function to do different things with different numbers and compositions of arguments, you have to code the dispatching logic yourself.
If you want to account for argument lists of indeterminant length, you can use the slurpy "varagin" and "varargout" keywords:
function [c, varargout] = bazooka(a, b, varargin)
varargin and varargout are special data types called "cell arrays" that we haven't discussed yet, so we won't see them used here.
That's a brief introduction to functions in M. We don't have all of this implemented in Matrixy yet, but we are making some pretty amazing progress. Come check it out.
Continuing my short tutorial about M, today I'm going to talk about variables and functions in M. This is, simultaneously, one of the hardest parts of the Matrixy compiler for us to get right, and we've just made our second attempt at it.
Functions in M are named and called very similarly to how functions are called in other languages like C or Perl:
foo(1, 2, 3);
Again, as we covered last time, omitting the trailing semicolon will print the return value of the function to the console, if any. As we'll see later, functions always take variadic argument lists by default, and it's up to the function itself to keep track of it's own input arguments. Functions can also return multiple values:
[a, b, c] = foo(1, 2, 3);
And again, it's up to the function to recognize how many return values are expected and alter it's behavior appropriately. There is no multiple dispatch in M, the function implements it's own logic internally to determine the configuration of input and output parameters and adjust it's behavior to that.
Matrices, as we saw previously, are very important to M. It originated as a linear algebra mathematics pack, after all. We index into a matrix variable in the same way that we call a function:
x = [1 2 3 4];
x(1) % ans = 1
This causes obvious problems with our parser at compile time because we have to determine whether "foo(1)" is a function call or a variable index. To make things a little more difficult, a bare identifier could also be either a variable or a function call with no arguments:
help; % Calls the "help" function with no args
x; % the variable x. store it's value in ans
So, as you can see, there is a lot of ambiguity that needs to be resolved at runtime when the parser finds an identifier bar:
1) If a variable bar has been defined locally, treat it like a variable and return it's value. Variables of the same name always overshadow functions.
2) If we have a user-defined function bar in the current scope, call that.
3) If there is a builtin function bar, call it
4) If all else fails, search the path for a file named "bar.m". this file should contain either a defintion for function bar, or it should be a "script file" which takes no parameters but is executed directly.
This is all made more difficult because M allows function handles to be stored in variables and called as functions:
y = @foo; % handle to function foo
y(1, 2); % call function foo(1, 2)
or even anonymous functions to be defined:
x = @(r) 2 * pi * r;
x(3); % 2 * pi * 3;
So even if baz is a variable as determined by rule #1 in the dispatch algorithm above, it could still cause the invocation of a function if it's a function handle.
Functions also have the ability to be executed using bare word arguments:
is the same as the call
The idea behind the ambiguity between variable and function dispatching, at least as I've heard it, is that it facilitates the interchange of pure functions and lookup tables. Either can be implemented, and the caller doesn't have to worry about which type it is.
Another point to be aware of is that M traditionally doesn't have namespaces, so all functions are visible by default at all time. This results in a large amount of namespace pollution, but also appears to be a driving force behind the multistage dispatch algorithm I showed above: There are far too many identifiers defined by default to prohibit the user from overriding any of them locally.
One last thing to mention is that if we absolutely need to call a function instead of indexing into a variable of the same name, we can use the feval() function. feval() takes the name of a function and a variadic list of arguments and calls that function, never indexing into a variable:
x = [1 2 3];
feval("x", 2); % function x, not variable x
So this has been a brief introduction to function and variable use in M. It should demonstrate that there is some significant difficulty on the part of the compiler-designer, but it creates an interesting situation for end users to interchange functions and lookup tables, and to silently override the huge library of builtin functions with their own versions of them if needed.
I promised to write a short series about programming in M, to try and show some of it's cool features. For those who don't know, M is the scripting language used by Matlab (proprietary) and Octave (Free). M is designed for linear algebra and engineering simulations.
x = 1;
x = sin(y) + 1;
When I said "mostly" above referring to the semicolons because they ar optional. If you leave the semicolon off the end, the value of the statement is printed to the terminal:
Write this: x = (3 + 5) * 2
And get this: x = 16
If you don't assign the value to a variable, you get the result stored in the default variable "ans"
Write this: 3 + (5 * 2)
Get this: ans = 13
M is based on matrices, so dealing with them is very easy:
x = [1 2 3; 4 5 6]
In the matrix constructor, the semicolons separate rows in the matrix. Elements in the row can be separated by commas or whitespace.
Matrices are used for working with strings too. Strings in a row are concatenated together:
x = ["hello ", "world"]
x = hello world
A matrix can either be treated like a string or like a regular numerical matrix, so writing this:
x = ["ok ", 49]
x = ok 1
We've actually used something very similar to that in the Matrixy test suite.
Combining matrices together is also very easy:
x = [1, 2, 3];
y = [4, 5, 6];
z = [x y]
z = 1 2 3 4 5 6
z = [x;y]
1 2 3
4 5 6
One caveat about matrices is that they must be uniform rectangular matrices. You can't have rows that are different lengths:
x = [1 2 3]
y = [4 5 6 7]
z = [x;y] % error!
This caveat is actually relaxed when it comes to strings, you can have strings of different lengths on different rows.
So that's a quick introduction to statements in M. Throughout the week I'll post more information about other aspects of it.
I haven't been doing much work on Parrot or even the Perl 6 book this week, instead focusing most of my efforts on Matrixy. Matrixy, which I introduced in my last post, is an M (Matlab/Octave) compiler for Parrot.
Progress has been going well, and just this morning I got a few more tricky features working. "working" is, of course, open to multiple definitions. This is especially true for such an idiomatic language as M.
I got nargin and nargout mostly working, along with varargin. These are the tools for doing variadic argument lists in M. Implementation isn't perfect still, but at least it's a start. I'm still fighting with lots of issues when it comes to function/variable dispatching. I finally have a fix ready that does what we need it to do, but it relies on the use of lexical variables which terribly breaks interactive mode because of issues with PCT.
Blair has been doing some great work on the libraries, and he's got some NCI stubs in place for interfacing with the BLAS and CLAPACK libraries, and converting our ResizablePMCArray-based matrices into the forms needed for use with these libraries. Linear algebra power is coming to Parrot!
I realize that a lot of people aren't familiar with M, it's a pretty specialized language that's used primarily for engineering modeling and scientific research. Maybe in my next few posts here I'll give a quick tutorial on the language and try to show some of the weird/cool idiosncracies that simultaneously make it an interesting language to work with and a difficult language to write a compiler for.
I've finally gotten my M interpreter project off the ground. M, for those not familiar with it, is the scripting language used by Matlab and Octave, and is oriented towards linear algebra and mathematical modeling. I had started idly working on it back when I was still in school as a way to pass the time on long train rides. I gave it up to focus on Parrot internals work for GSOC, and never went back to it because it felt to me like PCT was changing and evolving too rapidly for me to keep up with it.
Well things are more stable now and I found another interested participant for the project. Things are going very well now, and we're getting far more work done on this now then I was ever able to get done by myself. Blair is working on NCI bindings for the BLAS and LAPACK libraries, which will bring a lot of mathematical muscle to our little compiler and to the Parrot ecosystem as a whole. I've been focusing my attentions lately on getting some of the core syntax and idiosyncratic semantics of M implemented. Some things I've gotten to work now:
1) The ';' is used as a statement terminator like it is in Perl or C. However, it's optional. If ';' is ommitted from the end of a statement, the value of that statement is printed to the console. So writing "x = 5" will print "x = 5". A bare expression "5 + 6" will print "ans = 11". Writing "5 + 6;" will set the value of the default variable "ans" to 11, but won't print anything to the console. This mostly works.
2) Matrix indexing and function dispatching both use parenthesis. So "x(1)" is treated as the first element in array x if x is a variable, or is treated as calling the function x with argument 1 otherwise. This is confusing as hell in the parser, although it's an interesting design decision for the M language. You can interchange a function with a lookup table in M without having to change any calling syntax at all. This is partly implemented, although a few thorns are still standing in my way from getting this right.
3) Functions are typically all defined in their own files, so the function foo() will be defined in "libpath/foo.m". I've written a basic function dispatcher that handles this lookup, if a builtin function hasn't been found. Blair also did some great work refactoring my search path handling to be more correct and less hackish.
4) I've got basic Matrix definition implemented. You declare a matrix like this: x = [1, 2;3, 4] and if you leave the semicolon off, this will be printed:
So that's looking good. Matrices are constructed by nesting ResizablePMCArrays, which isn't an ideal solution (especially as the dimensions of the matrix increase above 2, and we try to keep things uniform length).
So that's how things are progressing with Matrixy. I'd like to invite any other interested participants to check out the source code and see what progress we've made so far. There's a lot of work left to do and, especially as pertains to the linear algebra and mathematics work, a lot of features for Matrixy that could benefit the rest of Parrot world.
All the magic about handling parameters in Parrot, all the slurpy params, flat args, named args, and optional params are all sorted out in the function Parrot_process_args. This function is called most often from parrot_pass_args and parrot_pass_args_fromc. These, in turn, are called from some very interesting places:
1) It gets called from the get_params opcode
2) Called from the set_returns opcodes
3) It gets called from inside the generated code of C-defined methods
4) a handful of other places such as in exception handlers called from certain places.
Parrot_process_args depends on certain values from the callers context, such as Keys (that I discussed last time) that can store references to registers instead of their values directly. This can cause a problem because of the various invocation paths and their handling of contexts in different ways:
1) Parrot_PCCINVOKE and Parrot_pcc_invoke_* functions create a new context to store passed-in params, and then invoke their sub objects.
2) The invoke vtable of a Sub creates a new context for itself
3) The generated C code of a C-defined method creates a new context for itself (NCI PMCs do not create one when invoked, so this is the workaround for that)
So if we are going through a Parrot_pcc_invoke_* call, we're going to be creating two contexts for almost every call: One context created in Parrot_pcc_invoke_* to hold the passed-in params, and one context created in the Sub or METHOD itself. Now when we finally do call Parrot_process_args, it can be either 1 or two levels below the context of the caller, which causes all sorts of problems when it comes to register references. I've conceived of an idea to store the callers context somewhere (possibly inside the CallSignature PMC somewhere) to handle these corner cases, but that might be a job for much much later in the refactoring process.
I've tried to resolve these references in Parrot_pcc_build_sig_object_from_varargs, but every time I do it seems to create hangs or crashes that I don't quite understand yet. I plan to get to the bottom of that soon.
I've been keeping busy lately with two projects. The first, a new one, is finally starting to implement the Matlab/Octave-on-parrot compiler. The second, an old one, is my continuing work on the calling_conventions refactor stuff.
I'll talk about the Matlab/Octave compiler, which is now called "Matrixy" later. For now you can check it out at it's Googlecode homepage.
My current cc work involves swapping out all the calls to Parrot_PCCINVOKE with calls to Parrot_pcc_invoke_method_from_c_args (or related variants). I had been trying to update the calls in the file src/io/api.c, but a few weird recursion situations were causing intermittent segfaults or coredumps. I worked on this for a few days, was getting pretty frustrated by it, and decided to do something else. So I picked up a Trac ticket where a segfault was causing problems involving overriding the init vtable method.
vtable overrides have been being handled by calls to Parrot_run_meth_fromc_args and related variants. While tracing through the segfault, I found that these functions are not very robust in their handling of contexts and were creating poisonous situations if they were used inside an invocation that used one of the different families of functions. I replaced the call to Parrot_run_method_fromc_args with Parrot_pcc_invoke_from_sig_object, and the segfault disappeared.
So, thinking back to some of the errors I was having with my cc work, I realized that Parrot_run_meth_fromc_args was poisoning recursive calls to Parrot_pcc_invoke_method_from_c_args, and I needed to destroy the former before I could make the switch over to the later. Working ast night and this morning, I finally made the switch. The switchover wasn't perfect because now I'm getting some weird errors in some of the tests, but it's better going then it was. I need to do some tests now to see if the functions in src/io/api.c can be updated now without causing more errors like it was earlier.
The calling_conventions refactor is large and sweeping, and is being done in multiple parts. Allison is working on another branch now to try and factor out common code from generated methods. I'm trying to unify all the various invocation functions to use common code. Once we get all the common code into one place, we're going to optimize the hell out of it (I already have lot of ideas for that!). The end result is that the system should be both more elegant and faster.
I don't think all of this work will land prior to 1.0, and in fact none of it might if we can't make the individual updates stable enough. Post-1.0 however, a lot is going to change under the hood, and all for the better.