I took over this module from Simon Cozens nearly 5 years ago when he gave away his modules, but never got around to any of my ideas for it. What was I thinking? In hindsight, I don’t get around to my own ideas for comparable lengths of time; I should never have applied for it.
I did start a refactor at some point that’s still sitting in my projects directory, but in all honesty I didn’t get very far before losing steam. I even delayed writing this note, thinking I’d get around to making my changes available, but they’re insubstantial and unfinished and it really doesn’t matter any more. There’s a patch in the RT queue that’s been sitting there for years. I didn’t think the patch too great, but it would have been better to apply it and release than leave it in limbo waiting for a reformulation that I’d never get around to.
It is time to admit defeat. I’ve been a bad steward.
Is there anyone reading this who cares about the module? Want to take it over?
The following should all do (almost!) the same:
pipe( 'zcat' ) | [ tar => x => '-O' ]
Pipe->new( 'zcat' ) | Pipe->new( tar => x => '-O' )
Pipe->new( 'zcat' )->std( Pipe->new( tar => x => '-O' ) )
Pipe->new( 'zcat' )->stdout( stdin => Pipe->new( tar => x => '-O' ) )
Pipe->new( 'zcat' )->pair( stdout => stdin => Pipe->new( tar => x => '-O' ) )
Thoughts? (Aside from module naming.)
(Rationale: setting up complex pipes manually with IO::Pipe/socketpair + fork results in brain-melting code. Shell shows, however, that pipes can easily be described declaratively. Why is there, 20 years after the inception of Perl, no module that provides a piping API congruent with shell syntax?)
Anyone who has read Jeff Friedl’s book can write a decent basic word splitter regex off the cuff:
m{ \G [ ]* (
" [^"\\]* (?: \\. [^"\\]* )* "
|
[^ ]+
) }gsx
I have written and used this many times. What has always bugged me about it, however, is that it captures the delimiters along with the content, so afterwards you have to do something like this to the captured value:
s!\A"(.*)"\z!$1!;
This is… not pretty.
Of course, you could use two captures:
m{ \G [ ]* (?:
" ( [^"\\]* (?: \\. [^"\\]* )* ) "
|
( [^ ]+ )
) }gsx
But then you need to check which of the two captures has the value – is it in $1 or $2? So this is still inelegant. The pattern has already done all the work of examining the string – why can’t it provide its results in an invariant form?
The problem is that the presence of the trailing quote must be dependent on the presence of a leading quote, so you must keep the quotes inside the alternation, so there is no way to avoid having either two distinct captures that exclude the quotes or a single broad capture that includes them.
Except, of course, you don’t have to and there is. True enough: when you rely on the matcher to pick an alternation implicitly, the quotes must be included in the alternation. But by using an extended regular expression feature (that has been marked experimental for a decade – what’s up with that?), namely conditional matches, you can make the match of the trailing quote conditional on the leading quote independently of an alternation.
m{ \G [ ]*
(")?
( (?(1)
[^"\\]* (?: \\. [^"\\]* )*
|
[^ ]+
) )
(?(1)")
}gsx
And of course you can (and must, in this case) use a conditional match in the middle to explicitly specify which of the cases to pick, depending on the presence of the leading quote.
This way, you can match surrounding delimiters in a captured alternation for some of its cases but not others, without having to include the delimiters in the capture.
Note that the interesting capture is now $2, not $1 – we need the first capture for the quote, since conditional matches can only use captured groups as conditionals. However, the matched word is always in $2, regardless of whether quotes were involved. Furthermore, $1 has now turned into a true boolean flag, whereas previously this information had to be inferred (however easily).
This pleases me.
Also, I reject the argument that
responds_to?checking is buggy because some people writemethod_missingmagic that breaks it. I reject the argument because I reject as buggy any code such that objectoresponds to methodmbuto.responds_to?(:m) => false. If you implement your ownmethod_missingfor a class, you should almost always implement your ownresponds_to?as well.
Exceptions are also used to implement the
breakandreturncontrol flow constructs
Wildly hit-or-miss weblog Perl Buzz has had to mention xkcd again. Why can’t you people leave the guy alone?
(SCNR.)
For many years, a very simple issue about writing an unfold in Perl 5 has stymied me.
(An unfold is the opposite of fold, of course. The latter is better known to Perl programmers as reduce from List::Util, which is a function that takes a block and a list and returns a single value by repeatedly applying the block to successive values. An unfold does the opposite: it takes a block and a single value and then repeatedly applies the block to the value, accumulating the return values to eventually return them as a list.)
A good unfold as I want it will allow the same thing as map does: for any iteration to return any number of elements, including zero. Also, the elements returned from an iteration must be passed out of the block as its return value for that iteration – otherwise you might as well write a conventional loop instead.
But then you have a problem: how do you know when the block is finished generating output? There is no possible return value you could use for this. The Python implementations of this function that I’ve seen invariably expect an exception to be thrown to signal the end of the iteration, which struck me as profoundly un-Perlish. But how else do you signal the termination condition out of band?
I could think of no sensible solution given the combination of constraints that I adopted. Until an hour ago, that is, when a flash of inspiration struck me.
I’ll let the code speak for itself.
sub induce (&$) {
my ( $c, $v ) = @_;
my @r;
for ( $v ) { push @r, $c->() while defined }
@r;
}
In this version, the initial scalar value is made available to the block in $_ (somewhat cleverly done via for, which is necessary because local $_ has lots of subtle pitfals), and the block is called repeatedly as long as $_ is defined. The block therefore signals the detection of its exit condition by undefining $_.
It’s that simple. It’s so simple I cannot believe it took me so many years to think of it, but there it is.
Here is a silly example:
my @reversed = induce { @$_ ? pop @$_ : undef $_ } [ 1
.. 10 ];
Of course, the eagle-eyed will immediately notice a problem: this produces 11 return values, the last one being the undef that got returned from undef $_. The fact that induce will collect the value of the final iteration rather than throw it away was a conscious design decision: there are two cases for the return value of the final iteration, either it is useful or not. What happens if it it’s not useful, but would be collected? Then you have to suppress it, which is easy. What happens if it’s useful, but would be thrown away? Then you have to arrange for the block to remember state so it can return the useful value first and then signal the termination condition on its next invocation. Since it is so much easier to suppress useless values that would be kept than to retain useful values that would be dropped, I chose to have induce keep the final iteration’s value.
There are various ways to arrange the suppression in the above example. One of them is to check more eagerly for whether another iteration will be necessary:
my @reversed = induce { my @l = @$_ ? pop @$_ : (); @$_ or undef $_; @l } [ 1
.. 10 ];
That is clearly extremely awkward. A simpler (and insignificantly less efficient) approach is to suppress the value returned by the undef function:
my @reversed = induce { @$_ ? pop @$_ : ( undef $_, return ) } [ 1
.. 10 ];
That’s much better, but still not pretty. In particular, that return can be awkward to place due to precedence rules. In the above example it requires those annoying parens. (Try it: the code compiles but breaks without them.) Instead, we’ll take a page from Javascript and write another function:
sub void {}
No, seriously. The point of this function is, well, to take any arguments you pass it, throw them away, do nothing, and return nothing – most importantly, to return an empty list in list context.
I admit that writing this, err, function greatly amused me. But the end result is quite satisfying:
my @reversed = induce { @$_ ? pop @$_ : void undef $_ } [ 1
.. 10 ];
So these two functions, induce (named like this instead of unfold, of course, to contrast with List::Util’s reduce) and void, will probably start appearing in my small scripts from now on.
I just started getting onto the one-perl-per-app train. By default, Configure wants to set up an installation with a deeply nested directory layout so as much as possible can be shared across installs. I don’t care about that – having completely separate installs is quite affordable these days. So all that hierarchy is merely annoying and serves no useful purpose, and I would prefer to simply have all modules in./lib and all XS components in./archlib below the root directory of the installation, without any further nesting for different Perl versions, system architectures and packaging authorities.
But figuring out exactly how to get Perl’s Configure to give me I want took almost two hours of fiddling and research (and the final hint came from a rather tangential archived mailing list post).
So I thought I would jot the recipe down here:
PREFIX=$HOME/perl/5.10.0 # pick any root directory you like
sh Configure -des \
-Dprefix=$PREFIX \
-Dinc_version_list=none \
-Dprivlib=$PREFIX/lib \
-Darchlib=$PREFIX/archlib \
-Dsitearch=$PREFIX/archlib \
-Dsitelib=$PREFIX/lib
The maddening part was to figure out that inc_version_list must be none, otherwise the sitearch and sitelib settings will be ignored and Configure will generate the default deeply nested layout for them.
I have to say that Perl requires rather a lot of work to beat it into submission to my preferences…
As such we have decided to go with “
\” as the new namespace separator [in PHP] instead of the current “::”
I’m sure that makes a lot of sense when the other options you seriously considered include “:)” and “:>”.
[Update: I wrote this before seeing Ovid’s posting about the same matter.]