Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

chromatic (983)

chromatic
  (email not shown publicly)
http://wgz.org/chromatic/

Blog Information [technorati.com] Profile for chr0matic [technorati.com]

Journal of chromatic (983)

Wednesday February 06, 2008
07:49 PM

Bioinformatics Benchmarks

[ #35596 ]

I just caught a link to a programming language benchmark for bioinformatics. Unsurprisingly, the Perl is grotty and the C and C++ and Java implementations beat it handily.

Are there any PDLlers who'd like to bring some sanity to the results? (My eyes are going on strike from the C-style nested loops. Make sure you catch the use of bitwise and as control flow operator. That has to work only by accident.)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • This will be what kills it. Looking for single chars in a string with perl is painful. There's no nice way around it, except perhaps to do a pre-split into an array, but then look at what you've got - an array of SVs - the overhead is huge.

    You *might* be able to make it faster with a regexp. But basically doing anything character by character in perl is very slow.

    (it's also one of the main reasons that XML::SAX::PurePerl is so slow)
    • (it's also one of the main reasons that XML::SAX::PurePerl is so slow)
      Ditto with PPI...

      I managed to compensate by applying a regex if the character I see suggests I can read ahead a fair way.

      You might be able to abuse the regex engine for this though...

      s/./something;$1/e
  • In the alignment.pl code, nearly all the time is spent in the 'compute f matrix' loop. Pre-splitting the strings to arrays saved a few seconds (and took nearly no time). Using @_ directly instead of assigning to lexicals in the score and max subroutines (and using the ?: operator to write one line functions) saved a few more seconds (the python didn't seem quite fair to compare since it has named parameters, so you save the assignment).

    There was also a lot of array indexing, so pre-assigning the first leve

    • And I don't know if it saved any time, but replacing the C-style for loops with perly 1..$n style ones made at least that part more readable.
  • Interestingly, 97% of the alignment.pl time is spent in the creation of the f-matrix.

    At the expense of some readability, I sped alignment.pl up by a factor of four (96 sec to 23 sec on my iMac G5 with perl5.8.6). You can view my modified code [chrisdolan.net] at your peril. The substr was not in fact the biggest cost -- I was surprised that changing to m/\G(.)/cg didn't save any time. The biggest win (about 40% time decrease) was unrolling the subroutines, of course, which is what some of the other languages may be doing