Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

schwern (1528)

schwern
  (email not shown publicly)
http://schwern.net/
AOL IM: MichaelSchwern (Add Buddy, Send Message)
Jabber: schwern@gmail.com

Schwern can destroy CPAN at his whim.

Journal of schwern (1528)

Sunday May 02, 2010
12:17 PM

Object::ID - A unique object identifier for any object

Something Perl's OO has been missing has been a reliable way to identify an object. Is $this the same as $that? Not asking if it contains the same information, but is it a referent to the same object? Have we seen it before? When I alter $this will I also be changing $that?

package Foo;
 
use Object::ID;
 
...write the class however you want...

Really, HOWEVER YOU WANT! Inside out, outside in, code refs, regexes, globs, Moose, Mouse... Call the constructor whatever you like, add in a DESTROY method. Doesn't matter, it'll work.

my $id   = $obj->object_id;
my $uuid = $obj->object_uuid;

object_id() is a cheap, process-specific identifier. object_uuid() is a bit more expensive on first call (it has to generate the UUID, about 30% slower) but it should be universally unique across machines and processes.

That's great for YOUR objects, but what about everyone else? You can either inject the Object::ID role one class at a time...

package DateTime;
use Object::ID;
 
my $date = DateTime->now;
say $date->object_id;

or you can load UNIVERSAL::Object::ID and every object has it. EVERY OBJECT! Even things you don't realize are objects.

use UNIVERSAL::Object::ID;
 
# Regexes are objects
say qr/foo/->object_id;
 
# Loading IO::Handle turns all filehandles into objects
use IO::Handle;
open my $fh, "foo/bar";
say $fh->object_id;

But OH GOD UNIVERSAL! Well, use at your own risk. Its handy to use in your own programs and private libraries. Or you can use Method::Lexical and apply it lexically.

Why not just use the object's reference address? Well, as people implementing inside-out objects discovered, they're not unique. They're not thread safe, and worse they're not even unique for the life of the process. Perl will reuse the reference of a destroyed object. Observe:

{
    package Foo;
 
    sub new {
        my $class = shift;
        return bless {}, $class;
    }
}
 
for(1..3) {
    my $obj = Foo->new;
    print "Object's reference is $obj\n";
}

Run that and you should get the same reference, three times, for three different objects.

And then there's the problem of string overloaded objects. You have to be careful to always use Scalar::Util::refaddr or overload::StrVal.

It turns out inside-out objects have nearly the same problem, and 5.10.0 introduced field hashes to solve that. rjbs explains the pain of all this at slide 120 in his excellent 5.10 For People Who Aren't Totally Insane. You can read the gory details of field hashes but it comes down to this: in 5.10 you can A) get a process unique, thread safe identifier for an object and B) you can store it in hash such that it gets destroyed when the object is destroyed. Perfect!

Because of this, if you look inside Object::ID you'll see there's not a lot to it. It makes a field hash to store the IDs in, a state variable to hold an ID counter, and then just accesses the field hash.

use Hash::Util::FieldHash qw(fieldhash);
fieldhash(my %IDs);
 
sub object_id {
    my $self = shift;
 
    state $last_id = "a";
 
    return $IDs{$self} //= ++$last_id;
}

No scary black magic (beyond what's inside fieldhash). Its so simple, which is why it works with everything.

Now, I didn't come up with this implementation. I just laid out the requirements and Vincent Pit filled in the blanks. I was only vaguely aware of field hashes, Vincent made the connection. Thank you VPIT!

Practical applications? Honestly, I'm not sure. I needed it as a shortcut for expensive object equality checks in perl5i. Maybe some of the OO theorists out there can fill this part in. Let me know what you might use it for.

Possible extensions? Well... with some tweaking Object::ID can be used as a universal object registry. Not only can you ask "does the object associated with this ID still exist" but field hashes provide the ability to get the object associated with an ID. It would only work on objects that have had their ID asked of them, and thus registered with the field hash, but how else would you have the ID? Is this useful? Is this a security hole? I dunno, but it would be easy.

Thursday April 01, 2010
05:51 PM

OMG UNICORNIFY::URL!!! <3

OMG! A UNICORN PONY FOR MEEEE!!!

You can get 1 2 wth Unicornify::URL!! Here's a program to do it at the command line.

use Acme::Pony;
               bUf
              Fybuf
              FyBuFFYbU
             FfyBUFfYbUff
            YBuffybuFfyBuF
           fYbUffYBUfFYbUff
          YbUFfYBuffYBuFFYBu
         FFybUffYBUffYBUfFYb
         UffYbUFfyBUffYBuFfyB
           UFFybUfFYBuffYbUFF
             ybUfFyBuFfyBufFy
             BuffYBufFyBUfFYB                uffY
            bUffybuffyBUFfyBuf              FYBuFfy
            BuFFybUFFyBUffyBuFFYbuffybUf   fYbUfFYBu
    fFYB   uFFyBufFyBUfFYbufFYbUFFYbUFfyBufFYBufFYBu
   FFyBufFyBUffYBufFYbUffYBUFfYBUFFyBuFfYbUFFybUffYBU
  ffyBU fFYbuffYbUffybuffYbuFfYbuFFyBuFFyBUfFybufFYbUf
fYbUF     fybUFfYBuffybuFfyBuFFYBuffYBUFFybuffybUffYBu
  fFYBuf    fyBuFFyBufFyBUffYBufFYbufFyBUFfybUFfYbuffyb
   uFfyB         UffYBUfFybUfFYbuFfYBUFfYbUffYBuffybuFf
                   yBuFFY     BuffYBUFFybuffybUffYbufFYb
                   ufFyb            UFfybufFYBuffybufFyb
                   UffYb              UffybUFfY buffybuF
                   fybUf                fyBuffy BUFfYbuF
                   FYbUF                fyBuffYbuFFyBUFf
                   ybUf               Fybu  ffYb UffybuF
                   fYbu               ffYB  UFFy  BuFFYB
                   UfFy               BuF   fybU   ffYB
                   Uff                       Ybu
                 fFyb                        uFf
                                          YBUF
                                            F
 
yBuFFYBUfFy

<3 k thx bye!!1!11!!!

Thursday March 25, 2010
10:45 PM

Some Facts About Schwern

Open Source Bridge requested that I improve my bio, so I decided to share some facts about myself...

Schwern has a copy of Perl 6, he lets Larry Wall borrow it and take notes.

Schwern once sneezed into a microphone and the text-to-speech conversion was a regex that turns crap into gold.

Damian Conway and Schwern once had an arm wrestling contest. The superposition still hasn’t collapsed.

Schwern was the keynote speaker at the first YAPC::Mars.

When Schwern runs a smoke test, the fire department is notified.

Dan Brown analyzed a JAPH Schwern wrote and discovered it contained the Bible.

Schwern writes Perl code that writes Makefiles that write shell scripts on VMS.

Schwern does not commit to master, master commits to Schwern.

SETI broadcast some of Schwern’s Perl code into space. 8 years later they got a reply thanking them for the improved hyper drive plans.

Schwern once accidentally typed “git pull —hard” and dragged Github’s server room 10 miles.

There are no free namespaces on CPAN, there are just modules Schwern has not written yet.

Perl's threads are implemented with a single strand of Schwern's hair.

"Schwern" cmp "Chuck Norris" will cause Perl to segfault rather than try to compare them.

Schwern’s tears are said to cure cancer, unfortunately his Perl code gives it right back.

Monday March 22, 2010
06:57 PM

The Basic Unit of Bug Report Frustration

I submitted a proposal to OSCON called "How To Report A Bug" about the social issues involved in reporting and accepting bug reports. Its still pending, but its caused me to do a little writing for it. I came up with this introduction which I feel sums up the problem well. I'm also tickled that one measures bug report frustration in bags of shit.

Developers often treat bug reports like someone dumping a bag of shit on your doorstep, ringing the bell and telling you to clean it up. That's not what they are. A bug report is someone pointing out that there's some shit on your doorstep, they stepped in it, and maybe it should be cleaned up.

Either way, nobody likes stepping in shit. And nobody likes cleaning up shit. So the whole interaction starts off on the wrong foot, perhaps the one covered in shit. Your job, as developer or as reporter, is to deliberately steer it back to being a positive one where the developer wants to fix shit and the reporter wants to continue to report shit.

Saturday January 23, 2010
09:00 PM

The return of perl5i!

After an extended period in hibernation, perl5i returns to CPAN with a rack of new changes since the last CPAN release. Thanks to Bruno Vecchi, Chas Owens, Darian Patrick, Jeff Lavallee, Michael Greb, rjbs, benh and chromatic for contributing!

Why was it deleted from CPAN? Version numbers. I'd used my usual ISO date integer style versioning but realized pretty quick that I need full X.Y.Z versioning to indicate incompatible changes. Yes, perl5i is planning to be incompatible but without breaking your code!

How? By allowing you to declare what major version of perl5i you rely on. At this point its most likely going to be "use perl5i::2" which has the nice benefit of also allowing you to declare a dependency on perl5i::2. Easy peasy.

The question remains, what should "use perl5i" do? Should it A) load the currently installed version of perl5i? Or B) die with instructions on what to do? A is convenient, but means you'll get (probably unknowingly) walloped by an incompatibility later. B is a little inconvenient, but it makes the planned incompatibilities explicit and visible. The existence of a perl5i command line program (which also works on the #! line) makes A less of a concern if you want to go that route. So likely in short order "use perl5i" won't work any more. But I'd still like your thoughts on the matter.

Monday January 11, 2010
05:59 PM

Great Perl Code

I was in a bar the other day talking with somebody about Perl. He asked, "what is some great Perl code I could read?" He was looking for a non-trivial amount of production Perl 5 code that elegantly solves a problem, and is beautiful to read.

I'm couldn't answer that. The code I see is either beautiful to read but fairly boring in what it does OR is an elegant solution but is terrifying to read (for example, autodie). I'm a biased observer, I work mostly with the plumbing so I see mostly the scary stuff that implements the elegant solutions.

What code would you show a non-Perl programmer read that is both beautiful and interesting?

Wednesday January 06, 2010
10:05 PM

gitPAN now updating

After a month's break I've fixed up the gitPAN importer so it can update distributions in place fairly efficiently. This involved a major overhaul of Parse::BACKPAN::Packages into the new SQLite and DBIx::Class backed BackPAN::Index. Faster, leaner, far more flexible.

Its grinding through an update now, should be done before tomorrow morning. Still running off my laptop, hosting is in the works and then daily updates can happen.

UPDATE: The update is complete. Hit a few snags. 1) I wasn't always sorting releases by date, so I might have gotten some out of order in this update. Eventually there will be a full sweep and rebuild. 2) Some parts of github are case sensitive, some parts are not, and this caused some issues when distribution names subtly change case (like WebService vs Webservice). BackPAN::Index treats this as two different distributions, github treats it as one repository, hilarity ensues.

So that's the first 90% complete. Given that too about a month and the last 10% takes 90% of the time... see you in 2011.

Saturday January 02, 2010
11:10 PM

CPAN's Greatest Hits - Path::Class

Once upon a time, file and directory manipulation was considered very convenient. Compared to languages like C and Java, which consider I/O as some sort of distasteful act that should best be done behind at least 9 layers of abstraction, its positively enlightened. But once you use something like Ruby's File and Dir objects Perl starts to look a touch out of date.

In Perl, reading a file or directory is a three step process. Safe path manipulation requires the brilliant but cumbersome File::Spec. Even something like deleting a file requires special code to be safe. Want to reliably delete or create a directory? That's another module, File::Path. Copying a file? File::Copy. And so on.

The root of the problem is that Perl represents paths as just strings. And while that's enough to uniquely identify them, its not nearly enough to actually do anything with them. You want an object, files and directories which know how to do it all. Path::Class provides.

Created by Ken Williams, Path::Class is it. Short, convenient constructors, string overloading and providing just about everything you'd want to do with a path. Since its just sugar on top of all the pre-existing and well-built File modules, its extremely robust.

Here is why it is awesome.

Slurp a file.

# Perl
open my $fh, "<", $file;
my $content = do { local $/; <$fh> };
close $fh;
 
# Path::Class
my $contents = file($file)->slurp;

Iterate over every file in a directory.

# Perl
opendir my $dh, $dir;
for my $thing (grep { $_ ne '.' or $_ ne '..' } readdir $dh) {
    ...
}
closedir $dh;
 
# Path::Class
for my $thing (dir($dir)->children) {
    ...
}

Change the subdir and file on a path (ie. from /some/path/foo/bar.txt to /some/path/baz/biff.txt)

# Perl
my($vol, $dir, $file) = File::Spec->splitpath($path);
my @dirs = File::Spec->splitdir($dir);
pop @dirs;
my $newpath = File::Spec->catpath($vol, @dirs, $newdir, $newfile);
 
# Path::Class
my $newpath = file($file)->parent->parent->subdir($newdir)->file($newfile);

That last example starts to demonstrate what happens once Path::Class objects become ubiquitous in your code. Rather than instantiating them when needed, they're just there and can be chained together for rapid manipulation. Since they're string overloaded there's no reason not to use them.

I neglected error handling in the examples above. Path::Class was written back when library functions calling die() was considered impolite. Before pjf hammered home (cleaved with a Bat'leth?) the point that exceptions are awesome. So you still have to do all the "or die ..." junk with Path::Class, you don't even get the convenience of autodie. Fortunately I hope to do something about that.

The next time you find yourself writing "use File::Spec" give Path::Class a shot.

Monday December 28, 2009
08:43 PM

Numbered test file abuse

I hate numbered test files. Not that it isn't useful to force the ordering of some tests, but because its not then necessary to force the ordering of EVERY test. At the worst case you're back to BASIC. Observe the test suite from SQL::Statement.

00error.t
01prepare.t
02executeDirect.t
03executeDBD.t
04names.t
05create.t
06group.t
07case.t
08join.t
09ops.t
10limit.t
11functions.t
12eval.t
13 call.t
14allcols.t
15naturaljoins.t
16morejoins.t
17quoting.t
18bigjoin.t
    19idents.t
20pod.t
21pod_coverage.t

Now, how much of that is really trying to order the tests and how much of it is just the order it happened to be written? Does the limit test really have to go after the functions test? Why is there a quoting test in the middle of three join tests? I see three that make any kind of sense, 00error.t (since most of the tests use RaiseError), 20pod.t and 21pod_coverage.t, which maybe go last, though its honestly not important that they do.

What's the harm? It cripples command line completion. And you've got the old BASIC problem of renumbering. I want to add a new test, where does it go? Do I have to puzzle out the implied dependencies? Do I stick it at the end? But then the POD tests aren't last any more. Do I renumber everything? Do I use a duplicate number? Do I say the hell with it and cram it into an existing test file?

Not worth it.

Realistically most test dependencies really want to express two things: Run this first and run this last. For that you have 00foo.t and zz-bar.t. 00compile.t, 00setup.t, zz-teardown.t, zz-pod.t, etc... Anything else, just write it without the number. Or if you really do have a fixed order use Test::Manifest.

If you do have tests that would do better running in order, then instead of smashing them together into one arbitrary numbering system, group them. That is, stick them into a common subdirectory and then apply the first/last numbering again. A clear candidate in SQL-Statement would be t/join to contain naturaljoins.t, morejoins.t and bigjoin.t. This has the advantage of easily letting you run all one group's tests in one shot. "prove -lr t/join".

I don't mean to pick on SQL::Statement, its just what I'm patching right now. Lots of distributions obsessively number their test files to no real advantage.

Monday December 14, 2009
05:27 PM

gitPAN is complete!

The gitPAN import is complete.

From BackPAN
------------
118,752 files
10,440,348,937 bytes (measured by adding individual file size)
21987 distributions (I skipped perl, parrot and parrot-cfg)

To git
------
21,766 repositories
4,495,204 bytes (measured by total disk usage
                                  after git gc with no checkout)
150 gigs on github (they have to index it)
12 days (lots of starts and stops)
1 laptop (1st gen Macbook)

I had to do it on a disk image because OS X's case-insensitive filesystem

I've written up a small FAQ. gitpan is reasonably stable, but you may have to rebase in the future.

Next, I take a break.

Then begins the second pass, mostly improving and adding tags. Here's the list of planned features. The second pass will be a rolling reimport of each distribution to bring everything up to the same standard, there was a lot of incremental improvements during the first pass. I expect this to be changes to commit logs and tags with very little content change.

The issue of PAUSE ownership I'm going to punt on. Its ugly and can be done entirely in parallel. If someone else makes available a historical distribution ownership database, gitPAN will use it.