Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of ambs (3914)

Thursday June 19, 2008
02:36 PM

CPAN Pearls

I am preparing a Lightning Talk for YAPC::EU::2008 in Copenhagen about CPAN Pearls. I am not sure if it will be accepted or not, but I am having fun preparing it, and thus, I'll continue.

What I want to find is not useful or brilliant modules. What I want to find are code snippets. Snippets of really stupid code (it might work, it might be useful, but it is stupid) and snippets of brilliant code (code that is intelligent, or elegant).

So, if you think you know some of this code, please point me to a module, version, file and block of lines. I would really appreciate.

Thanks!

Thursday June 12, 2008
11:57 AM

Storing word-grams

I am in the way of storing word-grams for big texts (read big = more than 3GB text files). I want 2-word, 3-word and 4-word tuples, and respective occurrence count.

When processing these texts (on a cluster) I do not have access to any RDBM system. Well, I have SQLite, Berkeley DB, GDBM and probably other similars that I am forgeting about.

As you might guess, the main problem with this is populating the database. For each word on the corpus I need to check if it (together with the neighbourhood) exists or not in the database. If it does, I increment the counter. If not, I add a new entry.

Given that I am working on a Cluster I can easily split the job in different chunks, so that each node process a different part of the text. At the end I just need to glue the final databases.

In my experiences SQLite seems to be faster tool for this task. But I may be wrong.

So, what would you use for that?

(I know that for questions PerlMonks might be better, but I just think that site is completly unusable :( )

Wednesday June 11, 2008
01:37 PM

Portuguese Perl Workshop, The First

Last week we hosted the first Portuguese Perl Workshop. It was a four day event. The first two were for training by brian d foy, and the last two days were the real workshop.

The workshop had the presence of well known Perl mongers (like me -- kidding), like brian d foy, Marty Pauley, Yuval Kogman, Daniel Ruoso, José Castro (aka cog), and some not so well-known Perl mongers like Nuno Carvalho (aka Smash) or Pedro Melo.

While not all things run as expected (we expected more audience) the organization is quite happy with this event results, and is already preparing the next one (next event, not next PPW... yet).

Monday June 09, 2008
03:52 PM

Generating methods

I wrote a new module. I am not sure yet of its utility, but it is already on CPAN. It is named LaTeX::Writer::Simple. Also, not sure yet if its interface will be the one available at the moment. But I would like to write about other thing: generating methods. It is cool to define methods during runtime:

BEGIN {
    @EXPORT = (qw/document p/);

    sub _def {
        my ($name, $sub) = @_;

        no strict 'refs';
        my $x = "LaTeX::Writer::Simple::$name";
        *$x = $sub;
        push @EXPORT, $name;
    }

    my @nl_commands = (qw/part chapter section
                          subsection subsubsection caption/);
    for my $c (@nl_commands) {
        _def($c, sub { _newcommand($c, @_)."\n" });
    }

    ...

It is just wonderful. And what I really liked was that Test::Pod::Coverage detects those methods, and complains about their lack of coverage. Wonderful!

Thursday May 08, 2008
12:30 PM

Why am I passionate about Perl?

brian d foy wrote a post about his keynote at the Portuguese Perl Workshop. As one of the main organizers, I think I should give the example. So, here I go.

The person who introduced me to Perl showed me that... it was concise and similar to C.

I first starting using Perl to... build a simple digital library on the web.

I kept using Perl because... it was used on some computer science classes for Natural Language Processing.

I can't stop thinking about Perl... because I like all that underlining magic.

I'm still using Perl because... there isn't any other language with such a wonderful and lazy community (and CPAN). Also, because Perl let you make your things done!

I get other people to use Perl by... teaching the Perl during Natural Language Processing classes, and by showing them how things can be wrote easily in Perl.

I also program in C, and I can't say I like more Perl or C. Different languages for different things. Sorry, brian!

Thursday May 01, 2008
03:57 PM

2008Q2 Grant Proposals

On TPF webblog (check links bellow) are a set of posts with proposals received by the Perl Foundation grants committee during the second call for grant proposals for 2008. Although not usual, the rules of the TPF GC are changing and we hope to make this a rule. Proposals are accepted during one month and after that period, they are posted for public discussion on the Internet. This is important to make GC more aware of the community interest on the project, and to help opening the grants attribution process.

During the month of April we received the following grant proposals:

Please take some time on reading the proposals carefully and give some feedback on the relevance of the proposals.

Tuesday April 22, 2008
12:51 PM

Fixing Archive::Any

Archive::Any is a small and nice module by Clint Moore to manage ZIP and TGZ files. It also includes some kind of plugin system to be able to open other archive types as well.

Meanwhile, Clint does not release any new version since November 2006, and there is a critical bug reported since January 2007. A simple bug on a test that makes CPAN to fail installing Archive::Any if Test::POD::Coverage is not installed.

Given this, I am willing to fix this module and release it. But for that, I would like first to ask if anybody knows Clint, so I can contact him first.

Thursday April 10, 2008
04:45 PM

Benchmarking Say

This is strange... use.perl doesn't have Perl as a Journal Topic. Anyway, I think I wrote about this previously, but now I performed some more tests, and thus, here goes some new results. The idea is to compare the new say function to the print function with a new line at the end of the string. To test this, I used the Benchmark module, and two groups of functions: functions that print a string, and functions that print a string with interpolated variables (a scalar and an array).

The four benchmarked functions were:

our $var1 = "!";
our @var2 = qw!Hello World!;

sub print_hello { print "Hello World!\n"; }

sub say_hello { say "Hello World!"; }

sub print_hello_vars { print "@var2$var1\n"; }

sub say_hello_vars { say "@var2$var1"; }

The number of iterations was 10,000,000. Given that all these functions print to the standard output, I redirected the output to a temporary file. Also, and to raise the quality of the test, I ran this benchmark three times.

Now on the results. Do you have any idea of the ordering? Well, first the results were not always the same: say and print swap positions some time. In any case, interpolating on a say is faster, it seems. Check for yourself the three test results:

                      Rate    printI     sayI    print      say
printInterpolate 1587302/s        --     -18%     -67%     -70%
sayInterpolate   1945525/s       23%       --     -60%     -63%
print            4807692/s      203%     147%       --      -8%
say              5208333/s      228%     168%       8%       --

printInterpolate 1647446/s        --     -10%     -66%     -68%
sayInterpolate   1828154/s       11%       --     -62%     -64%
say              4830918/s      193%     164%       --      -6%
print            5128205/s      211%     181%       6%       --

printInterpolate 1652893/s        --     -10%     -67%     -68%
sayInterpolate   1831502/s       11%       --     -64%     -64%
say              5076142/s      207%     177%       --      -1%
print            5102041/s      209%     179%       1%       --

Thursday April 03, 2008
01:19 PM

2008Q2 Call for Grants Proposals

The Perl Foundation is looking at giving some grants ranging from $500 to $3000 in May 2008.

In the past, we've supported Adam Kennedy's PPI and Strawberry Perl, Nicholas Clark's work on Perl internals, Jouke Visser's pVoice, Chris Dolan on Perl::Critic and many others (just check http://www.perlfoundation.org/grants for more references).

You don't have to have a large, complex, or lengthy project. You don't even have to be a Perl master or guru. If you have a good idea and the means and ability to accomplish it, we want to hear from you!

Do you have something that could benefit the Perl community but just need that little extra help? Submit a grant proposal by April 30.

As a general rule, a properly formatted grant proposal is more likely to be approved if it meets the following criteria:

  • It has widespread benefit to the Perl community or a large segment of it.
  • We have reasons to believe that you can accomplish your goals.
  • We can afford it.

To submit a proposal see the guidelines at http://www.perlfoundation.org/how_to_write_a_proposal and TPF rules of operation at http://www.perlfoundation.org/rules_of_operation. Then send your proposal to tpf-proposals@perl-foundation.org.

On May 1st submitters will be contacted individually regarding whether they will let their proposal details be available for public discussion, as public views of grants proposals is likely to become part of the standard in the future.

Tuesday March 25, 2008
03:52 PM

Arch Linux finally with 5.10.0 :(

Today my Arch Linux updated its Perl package to 5.10.0 (finally). But, unfortunately, configured in a strange way. Why to put binaries under /usr/bin/perlbin/vendor and /usr/bin/perlbin/site?

I mean, I know that there is that option in the configure script. But the default options (the one that the experts think are the more adequate) put them all together under /usr/bin.

Now I am compiling Perl by hand (not anything I never did before). I hate to do this, as it breaks the way package updates work.