Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

chromatic (983)

chromatic
  (email not shown publicly)
http://wgz.org/chromatic/

Blog Information [technorati.com] Profile for chr0matic [technorati.com]

Journal of chromatic (983)

Wednesday February 06, 2008
06:49 PM

Bioinformatics Benchmarks

[ #35596 ]

I just caught a link to a programming language benchmark for bioinformatics. Unsurprisingly, the Perl is grotty and the C and C++ and Java implementations beat it handily.

Are there any PDLlers who'd like to bring some sanity to the results? (My eyes are going on strike from the C-style nested loops. Make sure you catch the use of bitwise and as control flow operator. That has to work only by accident.)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • This will be what kills it. Looking for single chars in a string with perl is painful. There's no nice way around it, except perhaps to do a pre-split into an array, but then look at what you've got - an array of SVs - the overhead is huge.

    You *might* be able to make it faster with a regexp. But basically doing anything character by character in perl is very slow.

    (it's also one of the main reasons that XML::SAX::PurePerl is so slow)
    • (it's also one of the main reasons that XML::SAX::PurePerl is so slow)
      Ditto with PPI...

      I managed to compensate by applying a regex if the character I see suggests I can read ahead a fair way.

      You might be able to abuse the regex engine for this though...

      s/./something;$1/e
  • In the alignment.pl code, nearly all the time is spent in the 'compute f matrix' loop. Pre-splitting the strings to arrays saved a few seconds (and took nearly no time). Using @_ directly instead of assigning to lexicals in the score and max subroutines (and using the ?: operator to write one line functions) saved a few more seconds (the python didn't seem quite fair to compare since it has named parameters, so you save the assignment).

    There was also a lot of array indexing, so pre-assigning the first leve

    • And I don't know if it saved any time, but replacing the C-style for loops with perly 1..$n style ones made at least that part more readable.
  • Interestingly, 97% of the alignment.pl time is spent in the creation of the f-matrix.

    At the expense of some readability, I sped alignment.pl up by a factor of four (96 sec to 23 sec on my iMac G5 with perl5.8.6). You can view my modified code [chrisdolan.net] at your peril. The substr was not in fact the biggest cost -- I was surprised that changing to m/\G(.)/cg didn't save any time. The biggest win (about 40% time decrease) was unrolling the subroutines, of course, which is what some of the other languages may be doing