Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

schwern (1528)

schwern
  (email not shown publicly)
http://www.pobox.com/~schwern/
AOL IM: Michael Schwern (Add Buddy, Send Message)
Yahoo! ID: schwern (Add User, Send Message)

Journal of schwern (1528)

Monday April 28, 2008
03:08 PM

This is a Perl blog

I am continually horrified at the junk which shows up when you search for perl blog on Google. The first use.perl.org journal is 14 links down. perlbuzz isn't even on the page.

Perl blogs does much better, some SEO has been done on that by Andy and Skud and the Planet Perl folks -- honestly Google should figure out the blog/blogs singular/plural thing -- but again use.perl.org is #7 below hotscripts.com! The only reason it's on the list at all is because Skud happened to mention the word "blog" in a post.

Whether you prefer the term "journal" or "blog" or just don't care, you can have no effect on the world if they can't find you. People search for "blog", not "journal", that's simply how the vocabulary has swung. By refusing to acknowledge that, by not having the word blog anywhere on use.perl.org, not even in the otherwise unseen meta, is not a stand against what some people perceive to be stupid Internet lingo, but a self inflicted gag. If people can't find you, people can't find out what you're saying. We have rendered ourselves irrelevant to anyone not in the know.

People say "Perl is dead" because, to an outsider, they don't see anything going on. And instead of putting ourselves where people are looking, we think we can change where they look. The reality is if nobody is watching us we can have no influence over anyone but ourselves.

This is a blog. It is also a diary, a journal, a record and an account. It is not somehow harmful to use these synonyms, it is useful to draw more people in because different people think differently. The content and community of use.perl is strong enough to weather the addition of a synonym.

In order to route around this, I will simply start using the word "blog" in my "journal" and see if Google notices. I encourage others to do the same. And I hope it will start appearing in the site's meta information where it can hurt no one and only help our journals be found and read.

Wednesday April 23, 2008
07:59 PM

Embedding tests in a script

From time to time I'll work on a code base that's basically a pile of individual scripts. The process of converting it to a modularized system can take some time, technically as well as socially. Meanwhile, I have to get work done. And for me getting work done requires writing tests.

But if it's a pile of scripts, where do you put them? And with no build structure, how do you run them? Rather than having to decide between using a single file OR writing tests, I decided to embed the tests in the scripts themselves. Observe.

sub selftest {
    my @test_functions = get_test_functions();
 
    for my $function (sort { lc $a cmp lc $b } @test_functions) {
        no strict 'refs';
        print "# Running $function\n";
        &{$function};
    }
}
 
sub get_test_functions {
    my $package = shift || __PACKAGE__;
 
    # Load the test functions after __END__
    eval join '', <DATA>;
 
    no strict 'refs';
 
    return
      # Select only those which are subroutines
      grep { defined &{$_} }
 
      # Find the ones named test_*
      grep /^test_/,
 
      # Get all the symbols in the package
      keys %{$package."::"};
}
 
use Getopt::Long;
 
sub main {
    my %options;
    GetOptions(
        \%options,
        "test",
    );
 
    if( $options{test} ) {
        selftest();
        exit;
    }
 
    ... rest of the code here ...
}
 
main();
 
__END__
# These tests will be compiled and run when --test is given
 
use strict;
use warnings;
 
use Test::More 'no_plan';
 
sub test_the_tests {
    pass("The tests run!");
}
Giving a --test compiles the __END__ code (in selftest()), finds all the test_* functions, runs them and exits.

By embedding the tests into the scripts you can introduce unit testing to single-file scripters without having to simultaneously introduce the concept of a multi-file project. By putting the tests after the __END__ block nobody can make the excuse that your test functions are wasting memory in production.

I'm sure I'm not the first to come up with this, but I don't know that I've seen it modularized. So before I go and do that, is this already on CPAN?
Monday March 31, 2008
03:49 PM

London, Oslo, Edinburgh

Tomorrow afternoon I'll be in London for a few days doing nothing in particular. I mean to hit the Imperial War Museum (maybe they'll have a Star Destroyer on display), see the robot fish at the Aquarium, have tea, scones and fancy cheese on toast (rarebit) at Fortnum & Mason. I think the London Perl Mongers will have roving bands of thugs ready to accost me with pints and dim sum.

Then Friday afternoon I'll be in Oslo for the QA hackathon and Go Open conference.

The following Friday I'm going over to Edinburgh to visit a friend and back to Portland on Tuesday the 15th.
Saturday March 22, 2008
03:14 AM

New Module: XS::Writer

I have written a Perl module to write XS code which writes C code to wrap up more C code so that I can call it from Perl.

Follow?

Such is XS. I have a project to wrap up a big pile of undocumented, stinking C code in Perl in order to better test it. Because it's a ginormous pile and because the header files have all sorts of circular dependencies and because you need a big hairy autoconf generated pile of switches to compile it, h2xs goes into convulsions trying to deal with it. So I have to write the XS manually.

The basic XS for the basic functions wasn't that hard, after doing some puzzling out with "Extending and Embedding Perl" and the perlxs man page. Mostly it's just a matter of informing XS of the subroutine signature.

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
 
# Some magic from Devel::PPPort
#define NEED_sv_2pv_flags
#include "ppport.h"
 
MODULE = My::Thing                 PACKAGE = My::Thing
 
int check_file(char *file_name);
 
int isoption(char *option, int form);
Yes, you have to tell the routine to determine if something is an option if the option is going to be the short (s) or long (stupid) form. It doesn't just figure that out for itself with a strlen(), probably the rationale being that it would be a huge waste of resources! Wacky C programmers.

Did I mention the code does it's own argument processing? Did I mention that's a huge (and totally inconsequential) part of what it does?

Anyhow, the problem comes when you hit things that take structs. Like this struct to hold options.

typedef struct options_struct_t
{
  char *config_file_name;
  char *input_file_name;
  char *time_str;
  int test_mode;             /* flag set from command line or config value */
  int mail_input;            /* flag set from command line */
  int debug;                 /* flag set from command line */
  int save;                  /* flag set from command line */
  int debug_level;           /* output level for debug */
  ...and so on for about 20 lines...
} options_struct_t;
My first approach was to try and map this to a hash, but that required translating it back and forth from hash to struct on every function call which seemed inelegant. EEP suggested mapping it to an object, but its examples were simplistic. Tom Heady showed me an approach which worked and allowed me to have an accessor for each element of the struct.

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
 
#define NEED_sv_2pv_flags
#include "ppport.h"
 
#include "option.h"
 
typedef options_struct_t *              My::Option;
 
MODULE = My::Option        PACKAGE = My::Option       PREFIX=MY__Option_
 
My::Option
My__Option_new(char* CLASS)
    CODE:
       /* my_option initializes an options_struct_t */
       RETVAL = my_option();
        if( RETVAL == NULL ) {
           warn( "unable to create new My::Option" );
       }
    OUTPUT:
       RETVAL
 
int
My__Option_test_mode( My::Option self, ... )
    CODE:
        if( items > 1 )
            self->test_mode = SvIV(ST(1));
        RETVAL = self->test_mode;
    OUTPUT:
        RETVAL
Maybe not the best code, but it gives me a way to construct an object around the struct and access its elements. Trouble is I have to write an accessor for each element of the struct. There's no way to automate it. #define doesn't work in XS to make a macro. I hate cut & code, and there's plenty more structs to wrap.

I looked around for anything on CPAN that might make this easier. Inline::Struct looked promising but I couldn't get it to work and it has no facilities to deal with non-standard types. This code likes to use the Gnome lib types, GList and GHashTable. ExtUtils::XSBuilder looks really powerful and just what I'd need, except it looks just as complicated as XS itself.

So of course I wrote a module to write the code for me. XS::Writer is my first attempt. It will do some elementary parsing of a struct and write the XS code to wrap it up in an object. You can then INCLUDE: that XS file in another and add your own custom functions. It also allows you to write accessors to non-standard types.

And why keep it simple? It uses both Moose and autobox, neither of which I've used in production before! And Module::Build to put it all together, now I can see how it does with XS. Hey, when you're learning new things why not learn a whole pile of them?

You still need to know XS, but at least some of the drudge work can be taken care of. I don't know what else I'm going to put in other than structs, but I'm sure more is coming.

Next up, how to best deal with GList and GHashTable.
Sunday March 02, 2008
06:34 AM

Patch for Dreamhost

As threatened earlier, I've written up a patch to fix Cwd.pm in perl so it will build in a Dreamhost account. Now you can upgrade to 5.8.8.

The issue is that Perl's own version of abs_path() didn't know how to deal with a parent directory it didn't have read permissions on. On Dreamhost, you can't read /home. The patch will eventually go into PathTools.
Saturday February 23, 2008
11:12 PM

That damned Test::More threading test

I did it. I finally killed that damned threading test in Test::More. is_deeply_fail.t will no longer run unless AUTHOR_TESTING is set.

For those who don't know, that test would tickle intermittent threading bugs in certain vendor supplied versions of perl. The 5.8.6 OS X ships with and most Redhat derivatives are vulnerable, but it would only fail about one in a hundred times. I don't know what the problem is, I suspect it's from vendors pulling in bleadperl patches but I haven't confirmed. Nobody's put the time into it to find out.

Since there's nothing I can do about it, and it was just holding up CPAN installations, I turned the test off. Done.
Friday February 08, 2008
09:06 PM

Critical and Significant Dates

Speaking of 2038, J R Stockton wrote up an enormous and comprehensive list of Critical and Significant Dates. The original site has, alas, been knocked out but the Wayback Machine remembers all.

It's a fascinating look back at historical date and time related crises as well as possible future ones. 2038 is, of course, on there but there are hundreds more between now and then. Not just computing problems, but multiple ends of the world ranging from Mayan predictions to near-misses by asteroids. The last crisis point to pass was 2008-01-19 when 30 year look aheads fail. Next up is an unusually early Easter, 2008-03-23.

Not so exciting but what about 2010 when a lot of not particularly robust Y2K fixes will fail and 2019 when yet more will fail? Various dates used as magic marker values happen. 2009-09-09 (09/09/09) and 2011-11-11 (11/11/11). 2008-12-31 is coming up, the 366th day of the year which has caused major failures in the past. 2015-09-05 is when Apollo/HP 32 bit machines run out of time.

After 2025 the pace of systems failing accelerates. Quickbooks starts to die in 2025. 2028 overflows systems that store the year as 1900 + signed byte. More Y2K fixes break in 2028 and 2029. MSDOS file dates start to fail in 2030. Palm Pilots die in 2031. Microsoft's Y2K compliance ends at 2035...

Looking far, far into the future, around the year 300 billion 64 bit time_t runs out. Finally, in the year 2^1E80, it becomes impossible to express the date as the size of the year in binary is larger than the number of particles in the universe.

But by then I think we can safely assume we'll all have migrated to a 1e160 particle universe.
Thursday February 07, 2008
09:14 AM

Perl is now Y2038 safe

They said it couldn't be done. They said it SHOULDN'T be done! But I have here a working 64 bit localtime_r() on a machine with just 32 bits of time_t. Time zones, daylight savings time... it all works.

$ ./miniperl -wle 'print scalar localtime(2**35)'
Mon Oct 25 20:46:08 3058
Perl will be Y2038 safe. And yes, I'm going to get it backported to 5.10.

The underlying C functions are solid, but I just sort of rammed them down perl's throat because it's 5am. Here's the patch for the intrepid.

The underlying C implementation is a heavily modified version of the code from 2038bug.com written by Paul Sheer. He came up with the same basic algorithm I did:

1) Write a 64 bit clean gmtime(), that's a SMOP and Paul did that.
2) Run your time through this new gmtime_64().
3) Change the year to a year between 2012 and 2037.
4) Run it through the 32 bit system localtime() to get time zone stuff.
5) Move the year back to the original.

The trick is using a 32 bit safe year that has the same properties as the real year. This means same leap year status and same calendar layout. Had to do some tricky Gregorian calendar math aided by Graham, Nick and Abigail who realized there's a 28 year cycle within the larger 400 year Gregorian cycle and Mark Mielke who provided a big table of 64 bit localtime() results to test against. There's also some edge cases around New Year's, but they're all taken care of.

This approach will break when daylight savings time changes, but I'd rather be off by an hour than 137 years. The full Perl patch will always prefer the system's 64 bit localtime() if one is available. As more machines upgrade time_t this hack will be used less.

The nice part is this code isn't specific to Perl and can be used to fix up any other C-based Unix program.
Thursday January 31, 2008
04:15 PM

Arc, the "Hundred-Year Language", is obsolete by design.

Ovid reacted to the release of Arc, Paul Graham's attempt at the "100 Year Language", with a resounding yawn. I must say I agree. From reading the release announcement, not only is Arc not the programming language for the next 100 years, Arc is a resounding step backwards.

I read the announcement that "Arc only supports Ascii". What?! Just ASCII?! In 2008?! First release and he's already prevented about 80% of the world's population from using it. Maybe it's just a temporary thing, but no, he dismisses his choice of ASCII-only as simply being "unPC" and those who disagree are merely "offended" rather than, I don't know, needing those characters in order to WRITE THEIR NATIVE LANGUAGE! It's like he thinks people use Unicode just to write funny programs in Klingon.

This is supposed to be the 100 year language?! With so much of programming becoming internationalized in the last decade, with almost every modern application now speaking it, how can you dismiss Unicode in 2008? Hell, how could you dismiss Unicode if you were designing a language in 1998? Or 2003, when Arc started.

His rationale for not including Unicode is also spurious, it's because Python had a devil of a time dealing with it and also maintaining backwards compatibility. Arc is a new language, what backwards compatibility? He also says "I don't want to spend even one day dealing with character sets". The language designer not wanting to deal with something means he pushes the problem onto all the users: false laziness. Oddly enough, he could have resolved the whole issue by saying "it's all Unicode" and throwing out all the historical character sets. I realize Unicode isn't quite that simple, but man, give it a little thought. He even admits it would only take "a few days" to figure it out. Sorry all you Asian programmers, you're not worth a few days effort. Sorry Europe, umlauts are not worth Paul's time.

This is what happens when one guy (or a homogeneous group) goes into a room and tries to design anything. You come out with something reflecting the biases of that guy. Paul doesn't need or grok Unicode so Arc won't have it. Makes me wonder what other blind spots are in the language? How else won't Arc let the programmer think? Contrast with Parrot and Perl 6. Diverse, international programming team. Unicode is a top priority, along with all sorts of other advanced concepts that one person or another might not find worth putting in "a few days" effort on that will pay off for the programmers in the long run.

A programming language should not be a reflection of how the designer thinks.

There are plenty of other inanities and absurd historical Lisp-isms that he's dragged forward. cons? car? cdr? These only make sense to a seasoned Lisp programmer. If you're starting a brand new language why not give things names meaningful to a human rather than ones based on ancient hardware?

Anyhow, Ovid does a fine job on all that. I'm just... man, no Unicode? Astonishing.
12:48 PM

Let 1000 Perl Web Sites Bloom!

All the time I hear folks who have ideas for projects but then they say "yeah, but I don't have a server to put it on" or "I don't have a domain name for it" or "is this going to be the official site for X" or "I'd have to get a perl.org domain for it". And then that's where the idea dies for simple lack of hosting.

I am now taking away that excuse. Dreamhost has a promotion system where any member can provide a deep discount for friends to sign up. I have made such a promo. You can have a hosted account with then for $69.40 a year if you click this link and use the promo code "1001PERLSBLOOM". Pay with a credit card and you have three months to decide you want a refund. It also comes with two free domains and a unique IP address.

$69 for the first year, three months to decide if it's a good idea, two domain registrations, mail, wikis, a subversion repository, unique IP and a shell account. Their support is fantastic. Just look at all this crap you get! It's all the tools you need to put together your neat Perl idea for the cost of going out to eat.

"But... but... but..." yes, yes, yes it's not the absolute perfect system. Suck it up. This is the best you'll find at this price. If you wait around until you build your own you'll never do it. And you can always move it later. The important thing is "I don't have a server" is no longer an excuse for not doing your awesome Perl idea.

Go forth and let one thousand Perl web sites bloom!


(The one thing that is missing is mod_perl, they use FastCGI which is good enough for most things. But if you know of an equivalent service with mod_perl please post it).

(For those conspiracy theorists out there, no I'm not making money off of this)