I am preparing a Lightning Talk for YAPC::EU::2008 in Copenhagen about CPAN Pearls. I am not sure if it will be accepted or not, but I am having fun preparing it, and thus, I'll continue.
What I want to find is not useful or brilliant modules. What I want to find are code snippets. Snippets of really stupid code (it might work, it might be useful, but it is stupid) and snippets of brilliant code (code that is intelligent, or elegant).
So, if you think you know some of this code, please point me to a module, version, file and block of lines. I would really appreciate.
Thanks!
I am in the way of storing word-grams for big texts (read big = more than 3GB text files). I want 2-word, 3-word and 4-word tuples, and respective occurrence count.
When processing these texts (on a cluster) I do not have access to any RDBM system. Well, I have SQLite, Berkeley DB, GDBM and probably other similars that I am forgeting about.
As you might guess, the main problem with this is populating the database. For each word on the corpus I need to check if it (together with the neighbourhood) exists or not in the database. If it does, I increment the counter. If not, I add a new entry.
Given that I am working on a Cluster I can easily split the job in different chunks, so that each node process a different part of the text. At the end I just need to glue the final databases.
In my experiences SQLite seems to be faster tool for this task. But I may be wrong.
So, what would you use for that?
(I know that for questions PerlMonks might be better, but I just think that site is completly unusable
Last week we hosted the first Portuguese Perl Workshop. It was a four day event. The first two were for training by brian d foy, and the last two days were the real workshop.
The workshop had the presence of well known Perl mongers (like me -- kidding), like brian d foy, Marty Pauley, Yuval Kogman, Daniel Ruoso, José Castro (aka cog), and some not so well-known Perl mongers like Nuno Carvalho (aka Smash) or Pedro Melo.
While not all things run as expected (we expected more audience) the organization is quite happy with this event results, and is already preparing the next one (next event, not next PPW... yet).
BEGIN {
@EXPORT = (qw/document p/);
sub _def {
my ($name, $sub) = @_;
no strict 'refs';
my $x = "LaTeX::Writer::Simple::$name";
*$x = $sub;
push @EXPORT, $name;
}
my @nl_commands = (qw/part chapter section
subsection subsubsection caption/);
for my $c (@nl_commands) {
_def($c, sub { _newcommand($c, @_)."\n" });
}
...
It is just wonderful. And what I really liked was that Test::Pod::Coverage detects those methods, and complains about their lack of coverage. Wonderful!
brian d foy wrote a post about his keynote at the Portuguese Perl Workshop. As one of the main organizers, I think I should give the example. So, here I go.
The person who introduced me to Perl showed me that... it was concise and similar to C.
I first starting using Perl to... build a simple digital library on the web.
I kept using Perl because... it was used on some computer science classes for Natural Language Processing.
I can't stop thinking about Perl... because I like all that underlining magic.
I'm still using Perl because... there isn't any other language with such a wonderful and lazy community (and CPAN). Also, because Perl let you make your things done!
I get other people to use Perl by... teaching the Perl during Natural Language Processing classes, and by showing them how things can be wrote easily in Perl.
I also program in C, and I can't say I like more Perl or C. Different languages for different things. Sorry, brian!
Please take some time on reading the proposals carefully and give some feedback on the relevance of the proposals.
Archive::Any is a small and nice module by Clint Moore to manage ZIP and TGZ files. It also includes some kind of plugin system to be able to open other archive types as well.
Meanwhile, Clint does not release any new version since November 2006, and there is a critical bug reported since January 2007. A simple bug on a test that makes CPAN to fail installing Archive::Any if Test::POD::Coverage is not installed.
Given this, I am willing to fix this module and release it. But for that, I would like first to ask if anybody knows Clint, so I can contact him first.
This is strange... use.perl doesn't have Perl as a Journal Topic. Anyway, I think I wrote about this previously, but now I performed some more tests, and thus, here goes some new results. The idea is to compare the new say function to the print function with a new line at the end of the string. To test this, I used the Benchmark module, and two groups of functions: functions that print a string, and functions that print a string with interpolated variables (a scalar and an array).
The four benchmarked functions were:
our $var1 = "!";
our @var2 = qw!Hello World!;
sub print_hello { print "Hello World!\n"; }
sub say_hello { say "Hello World!"; }
sub print_hello_vars { print "@var2$var1\n"; }
sub say_hello_vars { say "@var2$var1"; }
The number of iterations was 10,000,000. Given that all these functions print to the standard output, I redirected the output to a temporary file. Also, and to raise the quality of the test, I ran this benchmark three times.
Now on the results. Do you have any idea of the ordering? Well, first the results were not always the same: say and print swap positions some time. In any case, interpolating on a say is faster, it seems. Check for yourself the three test results:
Rate printI sayI print say
printInterpolate 1587302/s -- -18% -67% -70%
sayInterpolate 1945525/s 23% -- -60% -63%
print 4807692/s 203% 147% -- -8%
say 5208333/s 228% 168% 8% --
printInterpolate 1647446/s -- -10% -66% -68%
sayInterpolate 1828154/s 11% -- -62% -64%
say 4830918/s 193% 164% -- -6%
print 5128205/s 211% 181% 6% --
printInterpolate 1652893/s -- -10% -67% -68%
sayInterpolate 1831502/s 11% -- -64% -64%
say 5076142/s 207% 177% -- -1%
print 5102041/s 209% 179% 1% --
The Perl Foundation is looking at giving some grants ranging from $500 to $3000 in May 2008.
In the past, we've supported Adam Kennedy's PPI and Strawberry Perl, Nicholas Clark's work on Perl internals, Jouke Visser's pVoice, Chris Dolan on Perl::Critic and many others (just check http://www.perlfoundation.org/grants for more references).
You don't have to have a large, complex, or lengthy project. You don't even have to be a Perl master or guru. If you have a good idea and the means and ability to accomplish it, we want to hear from you!
Do you have something that could benefit the Perl community but just need that little extra help? Submit a grant proposal by April 30.
As a general rule, a properly formatted grant proposal is more likely to be approved if it meets the following criteria:
To submit a proposal see the guidelines at http://www.perlfoundation.org/how_to_write_a_proposal and TPF rules of operation at http://www.perlfoundation.org/rules_of_operation. Then send your proposal to tpf-proposals@perl-foundation.org.
On May 1st submitters will be contacted individually regarding whether they will let their proposal details be available for public discussion, as public views of grants proposals is likely to become part of the standard in the future.