You can find most of my Open Source Perl software in my CPAN directory [cpan.org].
Writing code that modifies code is a difficult task. Writing code that modifies Perl code is a horrible task. Thankfully, writing Perl code that modifies Perl code is not quite as horrible as it could be, thanks to Adam Kennedy's PPI.
One stated goal of the Padre project is to provide refactoring tools for Perl code as well as reasonably possible. So far, there's shortcuts for replacing a variable in its lexical scope, finding a variable's declaration (be it a lexical, file-scoped (our), or package variable declared with 'use vars'), finding unmatched braces, and aligning code blocks on operators. These features are all useful, but they're only a subset of what more mature projects like Eclipse provide. A recent post on perlmonks discusses some examples of refactoring tools (or strategies) and their applicability to different languages. One of these is the Introduce Explaining Variable pattern. It's now implemented in Padre trunk. It's really quite simple, let me explain with an example:
The following code implements the derivative of the atan2 function. The code is from the Math::Symbolic::Derivative module. (I wrote it, so I'm complaining about my own cruft.) This basically implements the equation that is shown in the highlighted comment.
sub _derive_atan2 {
my ( $tree, $var, $cloned, $d_sub ) = @_;
# d/df atan(y/x) = x^2/(x^2+y^2) * (d/df y/x)
my ($op1, $op2) = @{$tree->{operands}};
my $inner = $d_sub->( $op1->new()/$op2->new(), $var, 0 );
# templates
my $two = Math::Symbolic::Constant->new(2);
my $op = Math::Symbolic::Operator->new('+', $two, $two);
my $result = $op->new('*',
$op->new('/',
$op->new('^', $op2->new(), $two->new()),
$op->new(
'+', $op->new('^', $op2->new(), $two->new()),
$op->new('^', $op1->new(), $two->new())
)
),
$inner
);
return $result;
}
Now, this is pretty hard to read. The $op1 and $op2 variables correspond to the function operands y and x respectively. $d_sub is a closure that can derive recursively. The two templates are simply a shorthand so I didn't have to write someclass->new(...) repeatedly. To make x and y more apparent and to name $d_sub more fitting to its purpose, I open up the file in Padre, right-click each of those variables, select Lexically Replace Variable from the context menu, and provide the new names. Similarly, I replace $inner. This yields:
sub _derive_atan2 {
my ( $tree, $var, $cloned, $derive ) = @_;
# d/df atan(y/x) = x^2/(x^2+y^2) * (d/df y/x)
my ($y, $x) = @{$tree->{operands}};
my $inner_derivative = $derive->( $y->new()/$x->new(), $var, 0 );
# templates
my $two = Math::Symbolic::Constant->new(2);
my $op = Math::Symbolic::Operator->new('+', $two, $two);
my $result = $op->new('*',
$op->new('/',
$op->new('^', $x->new(), $two->new()),
$op->new(
'+', $op->new('^', $x->new(), $two->new()),
$op->new('^', $y->new(), $two->new())
)
),
$inner_derivative
);
return $result;
}
Of course, that leaves the giant expression intact which actually calculates the result. It makes sense to add a few more temporary variables with descriptive names. I select $op->new('^', $x->new(), $two->new()) in the above version of the code, right-click, and select Insert Temporary Variable. Then I type the name of the new variable $x_square. Padre finds the beginning of the current statement for me and inserts a temporary variable declaration for $x_square at that point. It also replaces the selected text with $x_square. I manually replace another occurrance of the new temporary and then select $op->new('^', $y->new(), $two->new()) and have it replaced with $y_square accordingly. There's more that can be cleaned up, but this handful of clicks and practically no typing has improved the code's readability considerably:
sub _derive_atan2 {
my ( $tree, $var, $cloned, $derive ) = @_;
# d/df atan(y/x) = x^2/(x^2+y^2) * (d/df y/x)
my ($y, $x) = @{$tree->{operands}};
my $inner_derivative = $derive->( $y->new()/$x->new(), $var, 0 );
# templates
my $two = Math::Symbolic::Constant->new(2);
my $op = Math::Symbolic::Operator->new('+', $two, $two);
my $x_square = $op->new('^', $x->new(), $two->new());
my $y_square = $op->new('^', $y->new(), $two->new());
my $result = $op->new('*',
$op->new(
'/', $x_square, $op->new('+', $x_square, $y_square)
),
$inner_derivative
);
return $result;
}
Thus Padre helps me refactor crufty code with ease. Many more of these tiny helpers are planned. Stay tuned!
PS: If this didn't convince you, maybe you should just give it a shot. I had to wrestle use.perl for hours to get it to add the highlighting in the example code. If I could add screenshots of the real thing...
A few years ago, when I started studying physics, I wrote a set of modules for representing and dealing with algebraic expressions in Perl: Math::Symbolic. It's not a beauty, but it can be quite useful.
Occasionally, I'm getting mail from people who want it to perform the tasks of a full computer algebra system such as Mathematica. The short answer is: It's not even close to such a thing, it never will be, and was, in fact, never meant to be. One of the most frequent questions I get is a variation of:
"How can I expand this product of sums into a sum of products using Math::Symbolic?"
Here again, the answer is it can't do that out of the box. But since
I've been asked so many times, I wrote two implementations of that which
you'll find below. This is to prevent anyone asking me ever again
The first implementation is really simple and I'd almost call it elegant. It is, however, also quite slow.
use strict;
use warnings;
use Math::Symbolic qw/:all/;
use Math::Symbolic::Custom::Transformation qw/:all/;
my $function = parse_from_string(<<'HERE');
(a + b)*(d + e + f)
HERE
#(b + c + d + e + f)*(a + b)*(d + e + f)*(a + b + c + d)*(a + b + c + d + e)
print "Before: $function\n";
my $pattern = Math::Symbolic::Custom::Pattern->new(
parse_from_string('(TREE_x+TREE_y) * TREE_z'),
commutation => 1,
);
my $expand = new_trafo(
$pattern => 'TREE_x*TREE_z + TREE_y*TREE_z',
);
while (1) {
my $result = $expand->apply_recursive($function);
last if not defined $result;
$function = $result;
}
print "After: $function\n";
It uses the Math::Symbolic syntax itself to define the logic. Most of the work is actually done by the pattern matching and transformation modules Math::Symbolic::Custom::Pattern and Math::Symbolic::Custom::Transformation. The Pattern class defines search rules that can be matched against the expression's tree. The Transformation specifies rules to replace it with. Kind of like regular expressions. Just not as good (or fast).
The second implementation is likely much more useful and certainly a lot faster (but not optimized). It implements almost all of the logic manually and is based somewhat on Mark Jason Dominus wonderful iterator from Higher Order Perl. (Go, buy the book if you haven't. It's an utterly enjoyable read.)
use strict;
use warnings;
use Math::Symbolic qw/:all/;
my $function = parse_from_string(<<'HERE');
(a + b)*(d + e + f)
HERE
#(b + c + d + e + f)*(a + b)*(d + e + f)*(a + b + c + d)*(a + b + c + d + e)
# First, split the product into sums
my @sums = split_formula( B_PRODUCT, $function );
#print "$_\n" foreach @sums;
# Split each sum into its sub-terms
my @terms = map {
[ split_formula( B_SUM, $_ ) ]
} @sums;
my $n_terms = 1;
$n_terms *= @$_ for @terms;
print "Calculating all $n_terms terms...\n";
print "@$_\n" foreach @terms;
# This calculates the full formula in memory and stores it in $function
# $function = multiply(\@terms);
# print $function, "\n";
# This calculates each term and then prints it to STDOUT, but doesn't
# store it because memory is scarce
multiply_print(\@terms);
# We have to keep in mind that the formula is really a tree.
sub split_formula {
my $optype = shift;
my @formulas = @_;
my @split;
while (@formulas) {
my $f = shift @formulas;
if ($f->term_type == T_OPERATOR and $f->type == $optype) {
push @formulas, @{ $f->{operands} };
}
else {
push @split, $f;
}
}
return @split;
}
# all of the following is based on the iterator
# pattern of Mark Jason Dominus' "Higher Order Perl", p. 128ff
sub multiply {
my $terms = shift;
my ($max, $count) = make_pattern($terms);
my $func = make_product($terms, $count);
return $func unless increment($max, $count);
while (1) {
my $prod = make_product($terms, $count);
$func += $prod;
last unless increment($max, $count);
}
return $func;
}
sub multiply_print {
my $terms = shift;
my $iter = make_term_iterator($terms);
my $first = 1;
while (1) {
my $prod = $iter->();
last if not defined $prod;
if ($first) {
$first = 0;
print $prod;
} else {
print " + " . $prod;
}
}
print "\n";
return;
}
sub make_term_iterator {
my $terms = shift;
my ($max, $count) = make_pattern($terms);
my $empty = 0;
return sub {
return() if $empty;
my $func = make_product($terms, $count);
$empty = !increment($max, $count);
return $func;
};
}
sub make_product {
my $terms = shift;
my $count = shift;
# Note: One *could* save some CPU cycles by not cloning here (new).
# BUT that may lead to fun debugging and interesting memory cycles
# if you intend to actually use the tree.
my $prod = $terms->[0][ $count->[0] ]->new;
foreach my $i (1..$#$terms) {
$prod *= $terms->[$i][ $count->[$i] ]->new;
}
return $prod;
}
sub increment {
my $max = shift;
my $count = shift;
my $i = $#$count;
while (1) {
if ($count->[$i] < $max->[$i]) {
$count->[$i]++;
return 1;
}
else {
$count->[$i] = 0;
$i--;
}
if ($i < 0) {
return();
}
}
}
sub make_pattern {
my $terms = shift;
my @max;
my @pattern;
foreach my $set (@$terms) {
push @max, $#$set;
push @pattern, 0;
}
return \@max, \@pattern;
}
I bet you can see why that second implementation doesn't give me as much of a warm, fuzzy feeling.
Cheers,
Steffen
On the week-end, I finally implemented a few simple usability improvements for Padre, the Perl IDE.
With the recently released version 0.35, Padre supports a context (right click) menu that's actually context sensitive! (No shit!) If you right-click on a variable, Padre will offer additional options in the context menu that let you jump to the declaration of the variable or replace all occurrences of a lexical(!) variable.
Additionally, we stole a nice feature from Eclipse/Perl: If you hold Ctrl while left-clicking on a variable or subroutine name, the focus will jump to the respective definition. All this only works in the current file, unfortunately, but eventually, this will be a project-wide feature.
Today, a ticket in the bug tracker from Peter Makholm prompted me to implement an API for plugin authors. Your plugin can now provide different context menus depending on the document type, code at the cursor, additional modifier keys, or moon phase.
Update: Here's a simple example that you can copy&paste to your plugin to extend the context menu with a simple item
sub event_on_context_menu {
my ($self, $doc, $editor, $menu, $event) = @_;
$menu->AppendSeparator();
Wx::Event::EVT_MENU(
$editor,
$menu->Append( -1, Wx::gettext("Fun") ),
sub { warn "FUN!" },
);
return();
}
Cheers,
Steffen
Every moderately proficient Perl programmer will eventually be faced with the horror that is old code written by people who still thought golf was good programming style. Very recently, my worst experience until now was with code that had the friendly warning "use 5.002;" in the first line. As if that wasn't enough to scare the hell out of me, I was told the code had been written only in 2006 or so. Not just that: It had been devised by somebody I highly respect and know to be extremely intelligent. But an individual who simply hadn't known Perl when he wrote the program.
Here's one of the biggest downsides of the language in action. Somebody who isn't proficient but smart and creative will be able to craft complicated programs that (kind of) serve their complicated purpose and won't be readable by anyone but their inventor. Hackers who know the language well could do the same, but they know better ways to solve the problem at hand than resorting to unnecessary cleverness.
At this point, you already know the program in question doesn't use strictures.
Instead, it does interesting stuff like using the file handles (GLOBs!) of
literal name "0" to N to process N+1 files synchronously or using $_ implicitly
in a scope that spans way over 100 lines. Variables are named appropriately
as $a1, $a2, $a3, $a4 and $aa1, $aa2, $aa3, $aa4. But I mustn't forget my favorite: $hwww!
If you've ever had to deal with a complicated program that uses only globals, you will most certainly agree that the first step to understanding it is to declare those variables lexically in the tightest scope possible. That isolation of contexts makes it a damn sight easier to grok what's happening.
But I digress. This is about how Padre helped me fix this. I'd love to say that I
simply opened the document in the editor, positioned my cursor on one of those
pesky pseudo-globals and hit the "convert this global to a lexical in the tightest
sensible scope" action in the Perl-specific-features menu. You know, it
would walk the scope tree from the occurrence of the variable I highlighted and
find the tightest scope that contains all occurrences of the variable and declare
it there. Furthermore, if it's used as a loop variable a la for($i=0;$i<...;++$i),
it'd detect that its value likely not depended on outside the loop and declare
it there for me, too. But I haven't had the time to actually write that feature yet*.
I still had to figure out the scope of each variable manually. Instead,
once I had declared a variable *somewhere*, I could simply hit "replace this
lexical variable" in the aforementioned Perl menu and have all occurrences
(including "${aa1}" in strings) replaced with a less meaningless name.
This was particularly useful for loop variables which tend to be reused in different
scopes and thus meanings. A normal search/replace would require user interaction
to stop it after the current section of the code. One distraction less
while trying to understand some complicated piece of code.
But this isn't really how Padre saved my day. It's that when this heavy use of the lexical replacement feature triggered a couple of bugs in it, I was able to dive into the implementation head-first and simply fix it. It's just Perl and most of it is actually quite accessible! That's how Padre made my day less miserable. it helped my fix that ugly code and gave me the warm, fuzzy feeling of being in full control of my tools and particularly being able to improve them when I need to.
* The key here is: I could! So could you or any other Perl programmer.
My previous journal entry was about the PAR::Repository auto-upgrading feature. That, however, was just a precursor to the big news. Here's (approximately) what I posted to the PAR mailing list a few days ago:
Let me provide some context. Ever since I wrote PAR::Repository::*, people mistook it for a PAR-based PPM replacement. It was never intended to be a package manager/installer like PPM but instead as a sort of application server that could be comfortably and centrally managed, maintained, and upgraded. Even having separate staging and production repositories is quite simple as a PAR repository is just a directory on an ordinary web server or file system. Heck, you can even import one into git and switch branches as your heart desires. Since the clients simply fetch the most current packages for their specific needs, they are always be up to date when launching a new application from the repository.
After I gave a talk about PAR and the repository concept at YAPC::EU 2008 in Copenhagen, people again asked whether they could use a PAR repository in place of PPM. I said they couldn't and that the fundamental difference is that PAR::Repository finds dependencies dynamically, recursively, at run-time, whenever a module is required as opposed to PPM's static dependency information. But at the time, I already had a secret scheme for adding static dependency information to PAR repositories. Since the work on PAR is done purely in my not so copious spare time, I didn't spill the beans just yet in case I'd never get around to finish it. Seems I was lucky.
Since a couple of days ago, there are development releases of PAR, PAR::Dist, PAR::Repository, PAR::Indexer and PAR::Repository::Client[1] that sport support for static dependency extraction from
Getting to this point required a bit of Yak Shaving.
All involved modules have new releases on CPAN. They are mostly developer releases, since there must be serious bugs.
Thanks for reading!
Best regards,
Steffen
[1] To give the new releases a whirl, you can simply install PAR::Repository (for the server side) or PAR::Repository::Client (for the client, doh). No need to manually install all the distributions, they'll be picked up as dependencies.
When I think about telling people about PAR internals, a reply from a colleague readily comes to mind, when he was asked about an icky detail of his analysis:
You don't want to know how sausages are made!
But then I can't resist grossing out people with some details anyway...
Two years ago, I wrote PAR::Repository::Client as an interface for loading PARs and thus arbitrary modules from a remote server. If the client is installed, all you need to do to auto-load missing modules from the server is:
use PAR { repository => 'https://foo.com/myapp' };
use Foo; # will be loaded from remote if necessary
But since this may become expensive, and caching the binaries only removes part of that, the "install" option was part of the interface almost from the start:
use PAR { repository => 'https://foo.com/myapp', install => 1 };
use Foo; # will be loaded AND INSTALLED if necessary
Back then, I also added most of the code necessary for an "upgrade" option.
use PAR { repository => 'https://foo.com/myapp', upgrade => 1 };
use Foo; # will be loaded AND INSTALLED OR UPGRADED if necessary
Unfortunately, it was missing a few critical details until today. The repository client is normally only invoked when all other sources fail. But that's a problem if you're trying to check for upgrades. Thus, repositories in upgrade-mode are now checked early in the module-loading process.
The real bummer was that in order to check for upgrades, the locally installed version has to be determined. Since this is hard to do reliably without loading the module, that's what PAR has to do. But that means require()ing module X from within an early @INC hook that ran due to a "require X;". There's so many things wrong with that idea, it's not even funny. It seems that creating an infinite recursion in an @INC hook segfaults perl 5.8.9. Regardless, it can be (and was) made to work:
my $line = 1;
return \*I_AM_NOT_HERE, sub { $line ? ($_="1;",$line=0,return(1)) : ($_="",return(0)) };
Even disregarding the slight obfuscation, can you figure out how this works?
One obscure feature of @INC and the module loading is the return value(s) of a subroutine @INC hook. It normally simply returns a file handle that the module code is then read from. But if it returns a code ref as its second return value, that code ref is called repeatedly until it returns false. After each invocation, $_ is assumed to contain the next line of the module code. If the first argument was a file handle nonetheless, $_ is initialized to a new line from the file handle before calling the subroutine.
The motivation here is mostly that we want to set the file contents to "1;". Unfortunately, passing undef as the file handle resulted in the subroutine not being called. This smells like a bug in perl to me, but I'll have to check that more closely with blead. Furthermore, it's not wise to load any unnecessary modules in PAR.pm as they would have to be included verbatim in an uncompressed part of PAR::Packer created executables. Therefore, instead of simply passing a IO::Handle->new(), I'm supplying an arbitrary GLOB ref.
Finally, the subroutine itself simply sets $_ to "1;" in the first invocation and returns zero on the second to stop the evaluation, thus essentially short-circuiting require()'s loop through @INC.
After going through this considerable pain, I got the auto-upgrading feature of PAR::Repository::Client to work. There's probably still bugs and testing it as part of the test suite is no fun (but still feasible).
Stay tuned for a new release of the involved modules.
Cheers,
Steffen
Padre version 0.22 has just been uploaded to the PAUSE. That means it will propagate to the CPAN mirrors without a few hours. Like the previous release, the list of changes is quite long, but one particular achievement is support for highlighting Perl6 code and checking its syntax if Padre::Plugin::Perl6 is installed and enabled. Christmas is close.
Once the distribution has reached your CPAN mirror, you will be able to access the full Change Log here.
The next Padre release is very preliminarily scheduled for December 28th or 29th and we're still looking for a new release engineer.
Best regards,
Steffen
Today, I read an interesting interview with Bjarne Stroustrup, the father of C++, on DevX from August of this year. It's a good read, so if you're a C++ user, you should have a look. But even if you never touched any C++ code, there's a very interesting bit of information:
On page 6, the interviewer, Danny Kalev, asks Stroustrup:
Is C++ usage really declining, as some biased analysts and journalists have been trying to convince us for years (often not without ulterior motives), or is this complete nonsense?
Does that ring a bell? The "Perl is dead" crap that's been splashing down the gutters of the interweb waste disposal system, anyone? I urge you to read the full answer. Much of it applies to Perl as well. Please note that Stroustrup doesn't simply dismiss the issue. Here's an excerpt from his reply:
[...] C++ use appears to be declining in some areas and appears to be on an upswing in other areas. [...] Most of the popular measures basically measures noise and ought to report their findings in decibel rather than "popularity." Many of the major uses are in infrastructure (telecommunications, banking, embedded systems, etc.) where programmers don't go to conferences or describe their code in public. Many of the most interesting and important C++ applications are not noticed, they are not for sale to the public as programming products, and their implementation language is never mentioned. [...]
It's a really big world "out there" and the increase in the number of users of one language does not imply the decrease in the numbers of another. [...]
One simple thing that confuses many discussions of language use/popularity is the distinction between relative and absolute measures. For example, I say that C++ use is growing when I see user population grow by 200,000 programmers from 3.1M to 3.3M. However, somebody else may claim that "C++ is dying" because it's "popularity" has dropped from 16 percent to 10 percent of the total number of users. Both claims could be simultaneously true as the number of programmers continues to grow and especially as what is considered to be programming continues to change. [...]
Most of the popularity measures seem to measure buzz/noise, which is basically counting mentions on the web. That's potentially very misleading. Ten people learning a scripting language will make much more noise than a thousand full time programmers using C++, especially if the thousand C++ programmers are working on a project crucial to a company—such programmers typically don't post and are often not allowed to. My worry is that such measures may actually measure the number of novices and thus be an indication of a worsening shortage of C++ programmers. Worse, managers and academics may incautiously take such figures seriously (as a measure of quality) and become part of a vicious circle.
I know first hand about large C++ systems that don't produce the slightest bit of publicity for the language they're implemented in. It's what I deal with every day. Dito for large Perl code bases. Stroustrup hits the nail on the head about this issue (and C++). It's exactly what I think about Perl in the same context. There may be an issue with not generating as much noise as others (not enough new blood), but it's by no means an indication of stagnation or decline. People simply use it do what they always did. People also use it to do new stuff. But they don't blather about it all day. They earn their salary and at the end of the day, they go home to their families and spend their spare time on more interesting things than blogging about their favourite new toy language*. You have to realize: This applies to easily more than 95% of all professional programmers.
* That reminds me of something... got to go.
I'm quite proud to announce the release of Padre 0.21.
It features the biggest list of changes of any Padre release so far. The menu code has received a major overhaul, the editor has become multi-threaded, and we see more "advanced" Perl-specific features like reliably finding the location of a variable declaration or the experimental feature of replacing a lexical variable. The full list of changes can be found in the Changes file of the distribution.
The list of developers has grown to thirteen, but that list is probably not even complete. The program has been translated to nine languages at this point.
With this release, we are starting to rotate the release duty so the weekly or bi-weekly releases don't block on the availability of Gabor at those rare points in time when no major refactoring or feature implementation is going on.
You can find more information and the Padre mailing list, irc information, bug and issue tracking, etc. on the Padre site as usual.
Cheers,
Steffen
Since Class::XSAccessor now supports constructors (see previous journal entry), there's everything in place to implement Object::Tiny in XS. Hence you can find Object::Tiny::XS 1.01 on CPAN very soon. It's been uploaded to PAUSE.
The following benchmarks can be found in the distribution. They're comparing O::Tiny::XS to O::Tiny and Class::Accessor::Fast.
Benchmarking constructor plus accessors...
Rate accessor tiny tiny_xs
accessor 107325/s -- -40% -57%
tiny 177837/s 66% -- -29%
tiny_xs 248931/s 132% 40% --
Benchmarking constructor alone...
Rate accessor tiny tiny_xs
accessor 168659/s -- -49% -57%
tiny 330831/s 96% -- -17%
tiny_xs 396844/s 135% 20% --
Benchmarking accessors alone...
Rate accessor tiny tiny_xs
accessor 331/s -- -16% -54%
tiny 395/s 19% -- -45%
tiny_xs 715/s 116% 81% --