Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ChrisDolan (2855)

ChrisDolan
  (email not shown publicly)
http://www.chrisdolan.net/

Journal of ChrisDolan (2855)

Friday May 16, 2008
12:47 AM

A REALLY deep corner case

Try out this test program in a Perl prior to 5.8.8:

use Test::More tests => 3;
my $line = "\x{4E00}();" . ' ';
is(length substr($line, 1, 1), 1);
is(length substr($line, 1, 4), 4);
is(length substr($line, 1, 1), 1);
You'd expect that substrings of length 1 are always length 1, right? On my Mac (perl5.8.6) it produces:

1..3
ok 1
ok 2
not ok 3
#   Failed test at utf8_substr.t line 5.
#          got: '4'
#     expected: '1'
# Looks like you failed 1 test of 3.
This should surprise you, unless perhaps you were aware of the UTF-8 length caching bug(s) that haunted much of the 5.8.x series.

This program above is a minimal reduction of a failure in the PPI test suite (see RT#35917 - charsets.t eats all available VM). This bug is only triggered in the following case:

  • Perl 5.8.6 (and maybe 5.8.7?)
  • PPI above 1.201
  • Source code which uses Unicode in a bareword on the last line of the file, but not within the last 3 bytes of the end.


We would probably have never noticed, except 5.8.6 is the default Perl for Mac OS X 10.4 (i.e., a popular point release) and a PPI side effect of the bug was a infinite loop with a memory leak.

I'm VERY grateful that the core Perl developers include people smart enough to find and fix subtle bugs in the Unicode implementation like this one.
Thursday May 01, 2008
02:24 AM

Yacc shaving

I've been playing around with the Parrot Compiler Toolkit lately. I started with the superb tutorial that Klaas-Jan Stol wrote. Then I made a bad choice and started a port of a too-complex language (bash).

I quickly got bored with hand-translating bash's parse.y to PGE. So, I set that aside and started writing a parser for Yacc syntax, with the intention of outputting a rough Perl6 grammar and a stubbed out actions.pm file. I love Perl6 regex syntax -- it makes it almost easy to compose a big grammar.

I built the grammar and started on the actions for my Yacc parser. However, when I run it, it just spins and spins. Hmm, I must have some rule that's recursive or too slow. So, I've been adding "{*} #= open" and "{*} #= close" in several places and logging completed steps in the grammar. I guess I need to learn to use Parrot's debugger...
Sunday April 27, 2008
11:38 PM

[non-technical] Flamewars

I mistakenly got entangled in a flamewar on the Perl advocacy mail list. The details aren't important except that I wrote a message that I did not intend to be insulting, but could be (and was) interpreted as dismissive of a significant body of work. I quickly apologized and clarified my intentions, but my apology was rebuffed. What do you do in a scenario where an apology is not considered enough? Try again or just walk away?

I strive to maintain a strictly positive, constructive tone in all public communications. I don't always succeed, but I think my contributions to the open-source community have been almost always entropy-decreasing. This incident is really bothering me...
12:39 AM

YAPC talk competition

I'm grateful that my YAPC talk is not up against any Parrot talks. According to the schedule I'm up against Understanding Malware and Perl on Fedora . Of the three, mine is likely the most frivolous. :-)
Thursday April 17, 2008
12:52 AM

Presentation on testing to Madison.pm

I made a presentation to Madison.pm on software testing in general and Perl testing in particular. It was intended to be introductory since many Madmongers are casual Perl users and most do not have the testing religion. After discussing reasons for and types of testing, I delved into Test::More, Devel::Cover and Test::Class with some simple examples. Throughout the talk, I emphasized that the primary reason for testing was to gain confidence in a body of code.

Slides: http://chrisdolan.net/madmongers/testing.html
(the S5-based slides are optimized for Safari in 1024x768, but should be readable elsewhere)

Interesting reactions:
  • People were appalled that Test::Class invokes methods in alphabetic order instead of lexical order
  • Nobody but me was using Devel::Cover :-(
  • Test::Exception was criticized for poor interaction with Class::Exception (IIRC...)
  • My casual mention of Test::WWW::Mechanize generated a lot of interest. Some had used WWW::Mech already

Tuesday March 11, 2008
09:28 PM

Feed reading

I switched feed readers this week. This is a big deal to me! In years past, I was a contributor to the Planet project (my only open-source Python work to date; I overhauled the regexp filtering) so I've been using Planet (and later Venus) to aggregate my favorite RSS and Atom news sources for my personal consumption.

I preferred Planet over other readers because it is online. I run the aggregator via cron my my home machine and upload the gzipped results to my webserver. Then I can view the chronological results from any computer in the world. I stop reading when I hit something I remember reading before.

I rejected several competent readers which store results on just one computer (NetNewsWire, Sage, Safari, etc) since it doesn't support my mobile habits. I tried Plagger for a while, but it was so much harder to hack than Planet that I gave up and switched back. But lately, bugs in Planet have gotten the best of me. Planet loses posts left and right and has trouble retaining chronological order (Sam Ruby and I disagree on the latter point: I think it should sort on received date and he thinks it should sort on post date).

Lacking time to delve back into Plagger, I decided to try Google Reader. So far I like it. It works well across multiple computers because the read/unread state is stored online. The interface is attractive and easy to use. Its JIT content loading makes for quick startup.

And now the complaints:
  • It doesn't de-dupe! If the same story appears in Planet Perl, Planet Parrot, and use.Perl.org Journals, then it appears three times in Reader. Lame.
  • Google hijacked the "/" keystroke to focus their own search box, taking away Firefox's killer search-as-you-type feature (popularized by Emacs). Lame.


What feed readers do you like, dear journal reader?
Thursday January 31, 2008
01:00 AM

[Idea Bin] Sharesware

I use the "[Idea Bin]" flag to indicate ideas that I don't have time to implement myself. If you choose to use one of these ideas, you have my permission and best wishes.

There are a lot of successful business models surrounding open source software, but most of them make their money from something other than the software itself -- support, ads, documentation, extra features, customizations, etc. A few years ago I spent some time trying to think up a model that emphasized the software itself. The trick, of course, is how can you get someone to pay you for the software when they can get it from someone else for free (because the freedom that comes with most open source licenses includes the ability to give it away for free).

Well, what if you can encourage your users to join your business rather than subverting it? It's like shareware for an engaged community, so I call it sharesware.

The project founder starts with 100% of the shares. Every person who purchases the software gets 1 temporary share (new shares are created magically; they do not come out of the founder's pool). Everyone who contributes improvements to the project (including the founder) gets N permanent shares. Big features get lots of shares, bug fixes get a few shares, documentation may get a lot or just a few shares (depending on how badly the documentation is needed). Support in the forum may be worth shares too.

How do you decide how many shares to grant per task? The founder decides this at first, but later share holders may get to vote once per [time unit] to change the pay scale. Stakeholders are encouraged to be generous with shares since that is the currency that keeps the community alive.

What are shares worth? If you have at least one share, you are guaranteed access to the current source code. Otherwise you have to buy your way in with cash or labor, or make do with an older free release. Furthermore, some projects may distribute cash income proportional to shares. Some may grant you access to a finite resource (imagine if the software in question is a MMORPG; the resource is the server). Some may simply grant you power over the future of the project.

Could it work? I have no idea. But I can say that it won't work if 1) the project is not churning out new desirable features all the time or 2) if the founder is stingy with shares. My favorite feature of this model is the inherent value of labor vs. cash which means that charging money for the product does not necessarily cut off less wealthy people from using it.

Is it too complicated? Probably, especially given tax liability. But I'd be happy to be proven wrong.
Wednesday January 30, 2008
01:04 AM

[Idea Bin] Port bash to Parrot

I'm going to use the "[Idea Bin]" flag to indicate ideas or projects that I would love to develop if I had more time, but realistically admit that I'll never finish. If you choose to use one of these ideas, you have my permission and best wishes.

An implementation of bash on Parrot would be a great test of Parrot's I/O capabilities. The source code for bash is fairly readable considering how arcane are some of its supported operations. Bash has a rather healthy test suite that could be reused during development.

I spent many commutes pondering possible names for such a port, but never came up with a great one. I wanted "posh" to work, but never came up with the right words for the acronym. "bashup" = bash-under-parrot felt too forced.

If such a parrot implementation would enable you to call out to other languages via namespace prefixes on functions, this could let shell be a better glue language than ever. You could incrementally port hairy shell scripts to more expressive programming languages.

Then you could switch all of your /etc/rc scripts to run under parrot by just changing the sharp-bang.

Furthermore, I've been a csh/tcsh user for 15 years and have always envied the power of bash, but never bothered learning more than the rudiments. This would be an excuse to learn it deeply.
Thursday December 20, 2007
09:52 PM

Fuse, PDF and PAR

Last week I presented a fun topic to the Madison Perl Mongers: I created a user-mode filesystem that treats PDF files as disk images, and I packaged it with PAR.

For the demonstration, I wrote a quick-and-dirty Mac GUI front end. It probably works only on 10.4 (I didn't test 10.5) and isn't as powerful as the Fuse-PDF command line, but it's more accessible.
Sunday December 02, 2007
11:54 PM

Recommended: krugle.org code search

I recently had a Solaris failure report from CPAN testers. The failure report had errno.h numbers that I could not interpret, since I lack a Solaris 2.9 box. I knew that Solaris was now open sourced, so I tried to look for errno.h via Google code search. Blech, my google-fu was too weak.

So I instead tried krugle.org. I quickly found the OpenSolaris project, browsed the source code repository and found sys/errno.h. Wow.

It didn't solve my problem (alas, instead I need a new CPAN release with better failure diagnostics) but I was very impressed with the ease of use.

I first learned about Krugle via the 2006 Chicago hackathon. Ken Krugler bought us pizza and talked about getting better Perl/CPAN support into his product.