Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ChrisDolan (2855)

ChrisDolan
  (email not shown publicly)
http://www.chrisdolan.net/

Journal of ChrisDolan (2855)

Tuesday October 07, 2008
12:11 AM

Win32 non-cp1252 filenames

There was a P5P thread recently about encoding filename on Win32. A while back, I wrote an app that had to support Shift-JIS and other filesystem encodings transparently. I came up with the following unpleasant but successful hack.

Whenever I want to pass a filename to any system function (open, opendir, unlink, -f, etc) I wrap the filename string in a localfile() call like so: unlink localfile("foo.txt"). The localfile() function is defined as follows

use Encode;
use English qw(-no_match_vars);
 
my $encoding;
sub localfile {
    my ($filename) = @_;
    if (!defined $encoding) {
        $encoding = q{};
        if ($OSNAME eq 'MSWin32') {
            require Win32::Codepage;
            $encoding = Win32::Codepage::get_encoding() || q{};
            $encoding &&= Encode::resolve_alias($encoding) || q{};
        }
    }
    return $encoding ? encode($encoding, $filename) : $filename;
}

This solution is obnoxious because you have to wrap EVERY filename in your entire program. I tested by setting my working directory to something non-ASCII in a Shift-JIS Windows and looked for test failures.

It's the only solution that worked reliably for me, though, on arbitrarily-encoded filesystems. Just using utf8 filesystems is so much easier... Well, aside from normalization issues, that is.

Monday October 06, 2008
11:49 PM

CGI-Compress-Gzip v1.00: 1 day old bug fixed!

I was totally wrong yesterday. I blamed the spurious test failures on taint mode, but it was really autoflush that caused problems. Maybe taint was a problem too, but it was not the core problem. CGI::Compress::Gzip disables itself if autoflush mode is on, because that implies that the programmer wants HTML sent to the user NOW, not buffered and sent later via gzip compression.

I rolled out a 1.00 release this evening which I hope will work around the test failures. My thanks go out to Slaven Rezic (and others) for prompt smoke testing of new releases!

Sunday October 05, 2008
11:30 PM

CGI-Compress-Gzip v0.23: 5 year old bug fixed!

For over five years, I've been getting smoke failure reports for CGI::Compress::Gzip. I've tried many times to solve this problem and have always failed to reproduce it.

I finally figured it out today!

The problem was in the test code. The tests simulate a CGI environment by setting an envvar (HTTP_ACCEPT_ENCODING=gzip) and calling an external, very simple CGI program via backticks. The problem is that some smoke systems don't pass envvars (probably due to taint). So, the CGI always ran in non-gzip mode. The fix was simple.

Even though this test failure was spurious and was never reported by anyone except a smoke tester, I'm relieved to have fixed it finally. As always, I'm grateful for the patience and persistence of the smoke testing community! Perl would be nothing without its CPAN support community.

If this new release is good with the smoke testers, I'm going to push out a 1.00 release at long last.

Tuesday September 23, 2008
09:50 PM

CAM::PDF v1.50: Better late than never

Back in PDF v1.5 (which corresponds to Acrobat 6, in 2003), Adobe added a new feature where nearly all of the document metadata could be serialized in compressed blocks. It was the first completely incompatible feature that Adobe added to the document format since PDF v1.0, so adoption was slow even though it can save about 20-30% of the document size.

Despite reading large swaths of the PDF v1.5 spec and fielding questions from about a hundred CAM::PDF users over the years, I never heard about this feature. I overlooked it in the 952-page spec and never came across such a PDF in the wild...

...Until a month ago that is. Suddenly, people were emailing me left and right about support for this feature. I'm not sure what changed. Someone important (maybe a recent Acrobat release?) must have changed a default so new docs use the compressed syntax.

Now CAM::PDF v1.50 supports reading compressed streams. It still only supports writing the older PDF v1.4 style streams, so as a side effect it's a useful tool for downgrading your PDFs for broader compatibility. Along the way I fixed a serious bug in the PNG decompressor in my code. Wow, I can't believe nobody hit that one before.

It works very well (pretty good unit tests) but just, uh, don't look too close at the source code. I took some complex, 2002-era, barely-object-oriented code and added another layer of complexity on it. Man, if I had the time to refactor this, I would try to merge CAM::PDF's rich low-level feature set and speed with PDF::API2's saner API and the Perl PDF world would be much happier. Maybe for Rakudo 1.0...

Thursday September 18, 2008
11:17 PM

How ghostscript parses PDF files

Did you know that PDF was created to replace PostScript?

Did you know that PostScript is a Turing-complete language?

Did you know that ghostscript's PDF parser is written in PostScript?

/I /it love

Wednesday September 03, 2008
01:12 AM

SVN copying

I've had to learn this twice over the last 2 years, so I'm going to document it here for eternity. :-)

I have a Subversion repository and I want to split off a piece of it (one subdir) into a new repository on another server.

ssh my-old-server
  svnadmin dump repositories/myproject | svndumpfilter include subproject \
      | bzip2 -9 > subproject_at_rev_4753.bz2
  scp subproject_at_rev_4753.bz2 my-new-server:.
ssh my-new-server
  svnadmin create repos/newproject
  bzcat subproject_at_rev_4753.bz2 | svnadmin load repos/newproject

I don't use the --drop-empty-revs option on svndumpfilter because that seems to confuse svnmerge.py (I'm using SVN 1.4 still, not 1.5 yet).

Then in existing workspaces, I do:

  svn switch --relocate http://my-old-server/myproject/subproject \
      http://my-new-server/newproject/subproject

Tuesday July 29, 2008
12:26 AM

Attributes::Handler in 5.8 vs. 5.10

If you use attributes with multiple arguments like so:
        sub foo : myattr(one, two) { }

then it's important to realize that the attribute arguments are parsed differently under Perl 5.8 vs. Perl 5.10. In 5.8, you get a string like "one, two" passed to your :ATTR sub. Under 5.10, you instead get an arrayref like ['one', 'two'].

I had some 5.8 code that parsed the attribute args like so:
        my @args = split /\s*,\s*/, $args
which resulted in @args containing 'ARRAY[0x123456]' under 5.10! My new workaround that is compatible with 5.8 and 5.10 is:
        my @args = ref $args ? @{$args} : split /\s*,\s*/, $args;

If anyone sees flaws in this workaround, or has a better explanation, please comment.

Sunday July 27, 2008
08:13 PM

BarCamp

I attended BarCamp-Madison 2008 this weekend. It's the first time I've been to a BarCamp and it was significantly better than I expected.

Madison is a medium-sized city, but is the primary seat of government and university in Wisconsin (with lots of UW research spinoff companies). So there were a large number of techies in attendance, especially those representing the non-profit sector.

I presented my Fuse+PDF talk again (I'm into recycling :-)) to a small but very curious audience.

One very interesting discussion was about how to aggregate all of the local user groups in the area (Perl, Linux, Python, Java, Rails, PHP, etc, etc) into a tighter network. One of the products of that idea is the nascent web608.org (608 is the regional telephone prefix), which hopes to provide a central place to host mailing lists, meeting calendars, etc. The group hopes to identify a shared meeting space so all of the various users groups will have something tangible in common. I'm very excited about this plan. We also discussed the challenge of time-balancing evening user group meetings with children. People suggested a babysitting coop, or a kid-friendly room at/near the meeting space. Hmm...

The biggest bummer for me was when I described my involvement with Perl::Critic as "helping Perl developers write more readable code". A Python user laughed and said "Good luck with that!" :-(

Wednesday July 23, 2008
01:02 AM

More Yacc to Parrot translation

I mentioned a while back that I was playing with a PCT-based grammar to parse Yacc files and transcode the Yacc grammar to PCT.

My project took two big detours.

First, I re-learned that Yacc is a parser for a pre-tokenized stream and does not include lexing or scanning, unlike PCT. So it is infeasible to do a full, automated translation. I suspected that would be the case when I started, but I didn't realize how far from complete my translation would be. Basically, I generate a lot of mostly-useful PGE "rule {}" constructs, but then a lot of placeholder "token {}" constructs that need to be addressed by a human.

The second detour was that I was trying to learn Yacc and PCT/PGE at the same time, which was too much. So, I dropped down to Perl5 and wrote a parser based on the m/\G.../cgxms construct. The good news is that I finished, and the parser is blazing fast, if verbose. I can parse the whole Yacc vocabulary (well, Bison v2.1 really). My testcases are perl/perly.y, bash/parse.y, cola.y, lua51.y and even bison/parse-gram.y. I am generating PCT grammar.pg and actions.pm files that actually compile, but they are far from functional -- they're just starting points, really, but they're better than a blank page I think.

So, opinions are welcome:

  • Should I continue working on the Perl5 parser to get it to make better PCT output?
  • Should I work on porting the parser itself to PCT?
  • Should I put this aside and start using it to make real Parrot code from existing Yacc grammars?
  • Or should I work on the RT/CPANTesters bugs that have been accumulating against my existing packages? Sigh...
Saturday June 14, 2008
12:21 AM

Polymorphic database tables?

[I started asking this question on IRC, but it got too complicated... It seems like something basic that most DBAs should know, but I'm not a DBA and I couldn't find a good solution after some searching.]

What's the best way to represent polymorphism in a collection of database tables?

Consider a website where students answer surveys administered by faculty or departments. Start with three database tables: survey, faculty, and department. How do I indicate one-to-one ownership from faculty to survey and from department to survey? I like the strong-typing guarantees of foreign keys, so I really want to avoid un-keyed solutions.

I've thought about the following solutions, but I'm unhappy with all of them:

One null field
Put "faculty_id" and "department_id" foreign keys in the survey table and insist that exactly one is not null. This is awkward in code due to the pervasive conditionals, and problematic as I consider more things that both faculty and departments can own (e.g. student rosters)
Single owner table, two-to-one
The survey table has an owner_id which points to an owner table which has faculty_id and department_id fields, exactly one of them non-null. This is easier to code than the above because everything gets exactly one "owner".
Single owner table, two-to-many
Ownership is not represented in the survey table, but instead the owner table has faculty_id, department_id and survey_id fields. This seems to have no advantage over the "One null field" option.
Multiple owner tables
Create a faculty_survey and department_survey one-to-many tables. How do ensure that each survey is represented exactly once across those two tables?
Multiple survey tables
Partition the surveys into two tables, one for faculty surveys and one for department surveys. This is very painful as I add more things that can be owned.

Am I missing something obvious? What happens when I add another type that can be an owner?