Talks #3 down, one more to go. "Generating Test Data With The Sims".
In the interest of full disclosure, this talk isn't about actually using The Sims game to generate test data. That might make it 98% less awesome, but it probably makes it 98% more practical.
The talk is about a technique to handle those sorts of systems which need a lot of interdependent data before you can get anything done. Rather than have a static testing database, maybe a copy (I hope it's a copy) of the production database, this shows an easy way to generate interesting data on the fly specific to each test. It's a really simple technique. There's no complicated code.
This morning, at 8:15am when only the dead should be awake, I gave my "Perl Is unDead" talk. I did it before at YAPC::Asia but this time it took half the time and I felt the presentation was better. Maybe because I've done it before. Maybe because I wasn't feverish. Maybe because the microphone didn't work and I got to shout the talk. It really lends itself well to shouting.
I did my "Skimmable Code" talk yesterday and from the response it seems to have gone over very well. One thing that caught me entirely off guard was the response to the suggestion that end-of-block comments are a red flag for skimability and, once the block has been refactored to be shorter, should be removed.
while( $something > 42 ) {
...some code...
...and stuff...
} # while $something > 42
Usually any stylistic assertion gets at least one unhappy response from the audience. This one had it's own cheering section. I was a bit taken off guard by that and didn't have a good response. The argument was that when you can't see the condition, and no matter how short the block sometimes you can't -- maybe you're looking at code just after the block. These comments help put information in front of your face that otherwise you'd have to page up and down to see. And that's what skimmable code is about, putting all the relevant information in front of your face so you don't have to remember it.
End-of-block comments surely are a red flag. Here's an example from WWW::Mechanize which uses end-of-block comments habitually.
for my $value ( @_ ) {
# Handle type/value pairs an arrayref
if ( ref $value eq 'ARRAY' ) {
my ( $type, $value ) = @$value;
while ( my $input = shift @inputs ) {
next if $input->type eq 'hidden';
if ( $input->type eq $type ) {
$input->value( $value );
$num_set++;
last;
}
} # while
}
# by default, it's a value
else {
while ( my $input = shift @inputs ) {
next if $input->type eq 'hidden';
$input->value( $value );
$num_set++;
last;
} # while
}
} # for
They're not used here because the blocks are too long. They're used because nested structures, especially nested loops, are confusing. Even with good indentation habits. This leads to a need to comment which block is being closed. It's not even showing the conditional. They're a symptom of complexity. The comments should say to you "maybe this can be simpler".
I still recommend deleting them once the code has been refactored to be more skimmable. Why? It's more to filter out while skimming. It's a comment about *what* the code is doing. It's repeating what the code already says rendering it redundant. It's something you can ignore and thus just clutters up the screen. A good comment adds more information, saying why the code is there or explaining a complicated bit.
Straying into stylistic opinion, the comment makes it harder for me to see the closing brace. Rather than having a clean line with one character on it, there's a fuzz of comments breaking up the visual.
End-of-scope comments violate the DRY principle, you're repeating code. As such, it becomes a maintenance hassle. The comment is likely to not get updated as the conditional changes, especially if the comment repeats the whole condition, and you wind up with a lying comment. It's a bad comment, it shouldn't be there.
But, the original desire is valid. It's nice to be able to see the condition from the bottom of a scope. If it's rote code inspection, monkey work, let your monkey do it. This is something your editor should be doing for you. It's 2008. If I can't have my jetpack I can at least have an editor that can show me where a block starts. Emacs does this when you type the closing brace. I'm not sure how to get it to do it on-hover, but I'm sure someone will respond with the appropriate magic. Then, 5 minutes later some clever vi code will be forthcoming.
This is my first day out in Tokyo despite having arrived on Tuesday. I've spent the whole time either working on my talks for YAPC::Asia or being sick or both. I gave my keynote with likely a decent fever, which was a fine excuse for any flub. All of Saturday was spent sleeping and feverish. Today the fever broke and I got to see a little bit of Electric Town. Now I'm at the hackathon for a little bit "helping" Yuval remove Sub::Name from Moose, by studiously avoiding showing him EVE.
Alas, I fly out tomorrow.
Tomorrow I'm going to Tokyo for YAPC::Asia. I'm not entirely prepared for this trip. A couple days ago I forgot if I arranged a place to stay (I did). Fortunately, Karen and Dan and Miyagawa are far more on the ball than I. Thank you all for dealing with my momentary panic. I'm all packed. Hopefully my tree booze will pass customs inspection.
I'll be giving a talk Friday morning, You're Doing OO Wrong in which I'll attempt to undo 10 years of inheritance getting in the way of teaching good OO in 40 minutes.
Then Friday afternoon is a keynote, Perl is unDead in which I'll embrace the "Perl Is Dead" perception. It will involve zombies or something. I'm as eager to know what I'm going to say as you all are I'm sure.
I'm working on a contract that's been edumicating me a lot in C and
XS... and mostly making me realize that I didn't charge enough to have
to touch them.
One of the tasks is to write an XS wrapper around a C program so it
can be used as a function call from Perl. They don't want the
overhead of starting it up over and over again. Easy enough. Scoop
out the contents of main() into its own function (as I suspected,
doing an XS wrapper around main() has complications), change all the
exit()s to return()s and write a thin XS wrapper around that.
Now, how do you test it? This program prints to STDOUT and
STDERR... in C. None of the normal Perl output capturing methods
work. Can't tie it. Can't reopen it. Can't redirect it. C is
talking to file descriptors 1 and 2 and that's that. File descriptors
being something Perl generally protects us from behind file handles,
but fortunately you can still get at them.
A little bit of experimentation dug up the obscure ">>&=" open mode
which does the equivalent of fdopen() in C. That being, it opens on
the same file descriptor. Now I can change where the STDOUT and
STDERR file descriptors go, and C will see the change.
Here's what I did.
use File::Temp qw(:seekable);
use Fcntl qw(:seek);
# Make self-cleaning temp files each for STDOUT and STDERR
# redirection.
# We need real files so they have real file descriptors.
my $tmp_stdout = File::Temp->new(
UNLINK => 1,
TEMPLATE => "test_stdout_XXXXXX"
);
my $tmp_stderr = File::Temp->new(
UNLINK => 1,
TEMPLATE => "test_stderr_XXXXXX"
);
# Store a copy of STDOUT and STDERR.
open my $save_stdout, ">&", \*STDOUT or die "Can't save STDOUT: $!";
open my $save_stderr, ">&", \*STDERR or die "Can't save STDERR: $!";
# Point STDOUT and STDERR at my temp file descriptors.
open STDOUT, ">>&=", $tmp_stdout or die "Can't dup STDOUT: $!";
open STDERR, ">>&=", $tmp_stderr or die "Can't dup STDERR: $!";
# Run the C function in question.
my $exit = wrapped_c_function($command);
# Restore STDOUT and STDERR
open STDOUT, ">&", $save_stdout or die "Can't restore STDOUT: $!";
open STDERR, ">&", $save_stderr or die "Can't restore STDERR: $!";
# Seek back to the beginning of the temp files
$tmp_stdout->seek(0, SEEK_SET);
$tmp_stderr->seek(0, SEEK_SET);
# Read their contents.
my @stdout = <$tmp_stdout>;
my @stderr = <$tmp_stderr>;
Voila. Messy, but easily wrapped up in a module.
Because this technique works at the file descriptor level, it works
for system() and `` without having to get into shell redirection or
IPC::Open3.
I am continually horrified at the junk which shows up when you search for perl blog on Google. The first use.perl.org journal is 14 links down. perlbuzz isn't even on the page.
Perl blogs does much better, some SEO has been done on that by Andy and Skud and the Planet Perl folks -- honestly Google should figure out the blog/blogs singular/plural thing -- but again use.perl.org is #7 below hotscripts.com! The only reason it's on the list at all is because Skud happened to mention the word "blog" in a post.
Whether you prefer the term "journal" or "blog" or just don't care, you can have no effect on the world if they can't find you. People search for "blog", not "journal", that's simply how the vocabulary has swung. By refusing to acknowledge that, by not having the word blog anywhere on use.perl.org, not even in the otherwise unseen meta, is not a stand against what some people perceive to be stupid Internet lingo, but a self inflicted gag. If people can't find you, people can't find out what you're saying. We have rendered ourselves irrelevant to anyone not in the know.
People say "Perl is dead" because, to an outsider, they don't see anything going on. And instead of putting ourselves where people are looking, we think we can change where they look. The reality is if nobody is watching us we can have no influence over anyone but ourselves.
This is a blog. It is also a diary, a journal, a record and an account. It is not somehow harmful to use these synonyms, it is useful to draw more people in because different people think differently. The content and community of use.perl is strong enough to weather the addition of a synonym.
In order to route around this, I will simply start using the word "blog" in my "journal" and see if Google notices. I encourage others to do the same. And I hope it will start appearing in the site's meta information where it can hurt no one and only help our journals be found and read.
From time to time I'll work on a code base that's basically a pile of individual scripts. The process of converting it to a modularized system can take some time, technically as well as socially. Meanwhile, I have to get work done. And for me getting work done requires writing tests.
But if it's a pile of scripts, where do you put them? And with no build structure, how do you run them? Rather than having to decide between using a single file OR writing tests, I decided to embed the tests in the scripts themselves. Observe.
sub selftest {
my @test_functions = get_test_functions();
for my $function (sort { lc $a cmp lc $b } @test_functions) {
no strict 'refs';
print "# Running $function\n";
&{$function};
}
}
sub get_test_functions {
my $package = shift || __PACKAGE__;
# Load the test functions after __END__
eval join '', <DATA>;
no strict 'refs';
return
# Select only those which are subroutines
grep { defined &{$_} }
# Find the ones named test_*
grep/^test_/,
# Get all the symbols in the package
keys %{$package."::"};
}
use Getopt::Long;
sub main {
my %options;
GetOptions(
\%options,
"test",
);
if( $options{test} ) {
selftest();
exit;
}
... rest of the code here ...
}
main();
__END__
# These tests will be compiled and run when --test is given
use strict;
use warnings;
use Test::More 'no_plan';
sub test_the_tests {
pass("The tests run!");
}
Giving a --test compiles the __END__ code (in selftest()), finds all the test_* functions, runs them and exits.
By embedding the tests into the scripts you can introduce unit testing to single-file scripters without having to simultaneously introduce the concept of a multi-file project. By putting the tests after the __END__ block nobody can make the excuse that your test functions are wasting memory in production.
I'm sure I'm not the first to come up with this, but I don't know that I've seen it modularized. So before I go and do that, is this already on CPAN?
Tomorrow afternoon I'll be in London for a few days doing nothing in particular. I mean to hit the Imperial War Museum (maybe they'll have a Star Destroyer on display), see the robot fish at the Aquarium, have tea, scones and fancy cheese on toast (rarebit) at Fortnum & Mason. I think the London Perl Mongers will have roving bands of thugs ready to accost me with pints and dim sum.
Then Friday afternoon I'll be in Oslo for the QA hackathon and Go Open conference.
The following Friday I'm going over to Edinburgh to visit a friend and back to Portland on Tuesday the 15th.
I have written a Perl module to write XS code which writes C code to wrap up more C code so that I can call it from Perl.
Follow?
Such is XS. I have a project to wrap up a big pile of undocumented, stinking C code in Perl in order to better test it. Because it's a ginormous pile and because the header files have all sorts of circular dependencies and because you need a big hairy autoconf generated pile of switches to compile it, h2xs goes into convulsions trying to deal with it. So I have to write the XS manually.
The basic XS for the basic functions wasn't that hard, after doing some puzzling out with "Extending and Embedding Perl" and the perlxs man page. Mostly it's just a matter of informing XS of the subroutine signature.
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
# Some magic from Devel::PPPort
#define NEED_sv_2pv_flags
#include "ppport.h"
MODULE = My::Thing PACKAGE = My::Thing
int check_file(char *file_name);
int isoption(char *option, int form);
Yes, you have to tell the routine to determine if something is an option if the option is going to be the short (s) or long (stupid) form. It doesn't just figure that out for itself with a strlen(), probably the rationale being that it would be a huge waste of resources! Wacky C programmers.
Did I mention the code does it's own argument processing? Did I mention that's a huge (and totally inconsequential) part of what it does?
Anyhow, the problem comes when you hit things that take structs. Like this struct to hold options.
typedef struct options_struct_t
{
char *config_file_name;
char *input_file_name;
char *time_str;
int test_mode; /* flag set from command line or config value */
int mail_input;/* flag set from command line */
int debug; /* flag set from command line */
int save;/* flag set from command line */
int debug_level; /* output level for debug */
...and so on for about 20 lines...
} options_struct_t;
My first approach was to try and map this to a hash, but that required translating it back and forth from hash to struct on every function call which seemed inelegant. EEP suggested mapping it to an object, but its examples were simplistic. Tom Heady showed me an approach which worked and allowed me to have an accessor for each element of the struct.
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#define NEED_sv_2pv_flags
#include "ppport.h"
#include "option.h"
typedef options_struct_t * My::Option;
MODULE = My::Option PACKAGE = My::Option PREFIX=MY__Option_
My::Option
My__Option_new(char* CLASS)
CODE:
/* my_option initializes an options_struct_t */
RETVAL = my_option();
if( RETVAL == NULL ) {
warn( "unable to create new My::Option" );
}
OUTPUT:
RETVAL
int
My__Option_test_mode( My::Option self,... )
CODE:
if( items > 1 )
self->test_mode = SvIV(ST(1));
RETVAL = self->test_mode;
OUTPUT:
RETVAL
Maybe not the best code, but it gives me a way to construct an object around the struct and access its elements. Trouble is I have to write an accessor for each element of the struct. There's no way to automate it. #define doesn't work in XS to make a macro. I hate cut & code, and there's plenty more structs to wrap.
I looked around for anything on CPAN that might make this easier. Inline::Struct looked promising but I couldn't get it to work and it has no facilities to deal with non-standard types. This code likes to use the Gnome lib types, GList and GHashTable. ExtUtils::XSBuilder looks really powerful and just what I'd need, except it looks just as complicated as XS itself.
So of course I wrote a module to write the code for me. XS::Writer is my first attempt. It will do some elementary parsing of a struct and write the XS code to wrap it up in an object. You can then INCLUDE: that XS file in another and add your own custom functions. It also allows you to write accessors to non-standard types.
And why keep it simple? It uses both Moose and autobox, neither of which I've used in production before! And Module::Build to put it all together, now I can see how it does with XS. Hey, when you're learning new things why not learn a whole pile of them?
You still need to know XS, but at least some of the drudge work can be taken care of. I don't know what else I'm going to put in other than structs, but I'm sure more is coming.
Next up, how to best deal with GList and GHashTable.