Had a great time at the GPW. Met lots of interesting folks, saw some good talks (although probably didnt understand as much of them as I should given my crappy German skills), and had fun hanging out and meeting some of the people who I only know through the internet.
My talk went pretty well, it probably could be improved some, but considering it was the first time Ive presented to an audience of that size, (about 80-100 people I guess) I think it went just fine.
Annoyingly there are still some bugs in the Perl5 regex engine so a few things that should have worked didnt, but well thats life.
I probably shouldn't have drunk so much on the second day. I passed the point of being able to say no to more alcohol, and just might have made a fool out of myself blabbing on about my days as a courier in london...
Anyway, I'm really looking forward to the next perl gathering and hanging out with folks some more.
Cheers to the GPW organizers for doing a great job, and to eveyrbody that participated.
I've been thinking a lot about source control lately. Ive been having issues with my development process (using svn/TortoiseSvn) and working with p5p lately that really makes me think that the current scheme is not all that cool.
Currently Perl5 uses a Perforce repository hosted by Activestate. FWICT you are only allowed read access if you have a commit bit and for obvious reasons the pool of people with commit bits is small, and because of attrition in the developer community,
Now, most of the work I do is new features, and accordingly I'm not that fussed about not having a commit bit. Forcing peer review and a sober-second-thought on new features is a good plan, and many a bug has been caught by the two man process. Although annoyingly a couple of doozies have passed oversight for some time without any notice at all. (Which I suppose is to be expected in any development project of this size and complexity.)
However what really annoys me is that I can't easily synchronize my local version with the ongoing patch and when I upload my changes its via a snapshot. My local history is lost, and when the patch is merged in its showed as being from the person who committed it, and not me (although the patch comment usually shows more details). On at least one occassion this has lead to someone blaming a bug on the commiter and not me, the actual responsible party.
So I want a system where i can maintain my local tree, where I can easily sync it with others, and etc. It looks to me like Git is the right choice. While svk also seems interesting I don't like the fact that its based on SVN which Im slowly coming to dislike on an achitectural basis.
Now, I'm not a unix person in general, and it appears Git support on Windows is weak, but since it appears to work fine under cygwin I would happily change to using it. And frankly I'm getting more and more inclined to do a total permanent switchover to linux anyway.
I'm wondering how many Linux people will adopt a penguin for Christmas.
I wonder if the Penguin conservation groups have already figured out how much money is floating around under a penguin logo....
Considering they don't have a paypal account I guess not.
With patch Change #29430 the regex engine should now be sufficiently abstracted that it is reasonably feasable to write a plugin to use a different regex engine in perl. The documentation for most of the interface is contained in the perlreguts module although one needs to understand the flags in the regexp.h in order to make it work.
The basic idea is that the existing perl engine and data structures have been ripped in half. The structures which the perl core must interact are now well defined and the parts that are "private" to a given regex engine implementation are now isolated from the core.
A big issue with Perl 5.8 and earlier Perls regex engine pluggability interface was that it was intended mostly for swapping in a DEBUG build of the real engine, with the expectation that any engine in use could cope with the data structures generated by any other engine. Additionally some of the management routines for regexps were harded coded into the core, with the expectation that the core routine could handle any engines data. All of these design issues meant that basically you couldnt plug an arbitrary engine into perl. You would have to apply core patches to do even the meanest implementation.
In the new scheme what Perl needs to know about a pattern, is defined by the
struct regexp. What Perl needs to know about a regex engine is defined by the
struct regexp_engine. All a plug in needs to do is create a
regexp_engine that contains the appropriate callbacks and populate %^H appropriatly when the module is use()d. Perl will then use the compiler routine specified by the engine to create a
regexp struct which it will then use as though it was its own. The
regexp structure contains a pointer to the
regexp_engine structure which created it, so the regexp is completely encapsulated. You can compile a regexp using one engine and pass it through to a routine expecting a normal regexp and everything would work out.
So now all we need to do is get some intrepid hacker to actually put all this to good use and write a plug in for another engine.
And you know what?
The first person that publishes a proper regexp engine plug in for BleadPerl (soon to be Perl 5.10) will get a METRE OF BEER for doing so....
Ill give a bonus prize (to be determined) if the engine ported is the latest TCL engine.
So lets see how long it takes...
Note: I'm going to publish this challenge elsewhere over the next few days. Thanks to AdamK for the idea of the prize.
Welp, I'm back at the office for my last week. While I was they moved my desk to a different floor and when I came back I didnt even know where I was supposed to be! Anyway, its only a week of handover activities and then I can get back to hacking perl...
Speaking of hacking perl, my latest plan is to get the regex engine truely pluggable. Id like to make it possible, and preferably easy, to use alternate regex engines such as PCRE inside of perl.
My motivation for this is mostly to enable lexically scoped and run time control over debugging regexes, and partly so that I can do performance comparisons with other engines.
The possibilities are kinda crazy tho. If done properly one could easily implement the Perl 6 grammar engine as a perl 5 regex engine implementation, which itself uses perl 5's regex engine internally. Yes, this means one could hypothetically write their own regex engine in Perl.
I have to say, im really eager to get cracking on this. Id love to benchmark PCRE against Perls native language, and well lexically scoped debug mode regexes would be a real asset to debugging unexpected regex behaviour.
(Trying to post more often to my journal.)
I recently left my job and have been decompressing from a long period of continuous employment by enjoying some time off with some nice weather, bike riding, late mornings, breakfast at the cafe and lots of hacking.
I've managed to close off most of my regex engine todo list: ANYOF and jump tries, aho-corasick startclass matching, postitive-look(ahead|behind) optimisation (theres some rough edges to deal with on this one yet), charname support in the parser, and about 75% of what needs to happen to the debug output to make it "non-regex engine hacker" friendly.
All this combined with my earlier efforts (single-char-ANYOF to EXACT, simple-TRIE's) mean that perl now comfortably outperforms python in the so-called "rebench" tests. (bleadperl average time per test: 28 usecs, python average time per test: 297 usecs) The use of regex preprocessors to do "trie" like optimisation will no longer be necessary, and in fact will often slow things down, as it will result in duplicating things that are already happening internally.
I need to revisit perlreguts and update it with what Ive learned, and in some cases what has changed.
I need to update my journal more often.
Things still to do: Clean up/reorganize re Debug stuff to be easier to use. Look into using reghop4() for things. Look into the MAGIC hack for making things pluggable. Look into migrating code in sv.c re_dup over to regcomp.c Maybe split regcomp into several pieces to make it smaller and easier to manage.
Released the new version of DDS yesterday. Diotalevi from perlmonks worked with me to make it handle a couple of special cases in a much better way.
The main new features are that weakrefs, overloaded objects and closures are all now dumped properly. In particular the lexicals that are bound to a closure a dumped along with it. AFAIK this is the first tool ever to have this feature.
Hacking the core has got be one of the most interesting things I've done. Not being much of C programmer (im one of those Pascal weenies) its been a non stop rollercoaster ride of learning, head banging and even more learning.
Anyway, I finally was able to achieve a dream of mine and add Trie matching to Perls regex engine. Raphael applied it to 5.9.2 as patch 24044 on March 18, 2005. A moment I doubt I will ever forget.
Since then ive been plugging away at the second phase of my regex plans which is to add Aho-Corasick matching support. I released a patch for it just the other day but sofar it hasnt worked out as well as the plain Trie patch. It seems to have problems building on some folks machines, and seems to add an unnacceptable overhead to some regexes that involve normal Tries and not Aho-Corasick enhanced ones.
I have to say the slowdown is at the point totally inexplicable, as I would expect the code from the second patch to be in fact slightly more efficient. Theres weirdness afoot that i really dont understand.
Anyway, it feels good to have contributed this, especially as it should eventually result in performance improvements in things like SpamAssassin which would probably be a boon to many folks out there. And doing a little bit to fight spam makes me feel good.
Anyway, Ive come to the conclusion that (and 'a data structure' means any arbitrary data structure)
Also in the course of developing Data::BFDump I have discovered bugs in
This accounts for some of the most well known and highly regarded authors in Perldom. And you know what? Of the bugs I have reported NONE of the authors has responded. Not so much of a peep. Even on the perl5porters list I got only one reply. Frankly the impression I get of that place is that if you cant write C/C++ code in your sleep then they wont even speak to you. A pretty disappointing reaction considering the kind of community that Ive come to love at places like Perlmonks.
Dont get me wrong I appeciate that the perl porters work hard and do a good job. I appreciate that many of the authors above are very busy people with lots on their plate. But a lousy email saying 'ACK' to a bug report really doesnt seem like that much to ask. Hell at least I went out of my way to find and inform them of the bug.
a slighty annoyed demerphq
For instance Data::Dump will go into fits (infinte loop) when trying to dump the following
And a bug in dumper can be seen from
$VAR1 = \\'Foo';
Problem is that this is wrong (doesnt look it though does it?)
# this is fine...
# this is an illegal attempt to alter a read only variable.
Im looking forward to releasing my new dumper in the next few days. It wont have either of these problems and represents a new approach to dumper modules... Look for it on CPAN under the name Data::BFDumper in the next few days....