Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • What exactly are folk worried about?

    I've used perl for:

    Data munging


    Real-time, high availability, telecomms apps

    One-off parsers many & various

    Simulations & customer demos

    Web-services & clients



    & so on

    Perl is admirably fit for any soft-dev task you have in mind, subject to performance constraints in certain situations. It may not be a *perfect* fit in all contexts but then again expectation to the contrary is probably not reasonable.

    As for supposedly not being fit for

    • The Strawberry team is going to bundle BioPerl in the default install of Strawberry Professional, which will also come with Padre pre-installed.

      So we can offer than a zero-effort bootstrap into a BioPerl environment.

      • Speaking from experience in both academia and bioinformatics, BioPerl is the opposite of a selling point; it's over-engineered and half-implemented, almost always more trouble than it's worth. Perl is attractive for its text processing and system scripting. If you want to make Perl more attractive to biologists, make it easier to interface with C and R (e.g. via Inline::).

        • Which Perl-wrapped C libraries would be interesting to biologists?
          • SVM-Light and the NCBI tools come to mind.

            • There is a perl wrapper for SVM-Light:


              There are also perl & python wrappers for the NCBI tools:


              Given the extensible nature of Padre I'd imagine a customised version that ships these & presents dialogue boxes, tree views etc to interface them, wouldn't present an insurmountable challenge.

              • That's completely useless for large datasets:

                        (attributes => {foo => 1, bar => 1, baz => 3},
                          label => 1);

                C is a good least common denominator, so it helps to make a scripting language's interface to C as painless as possible. XS is hardly "painless," so there's room for someone to create such an interface.

                • Perhaps for large datasets you could load them from a file:



                  What other features would you like to see in the interface? It may well be possible to wrap/sub-class Algorithm::SVMLight to tweak the interface.

                  If you have anything specific in mind I'd guess the author of the module would be happy to have a look.

                  • A function to read data from a TSV/CSV file (in C, without going through Perl) would be extremely useful; ideally, $file would be a file handle rather than a name, so I could pipe it from standard input. A function to operate on a double** generated by some other C library would also be useful.

                    More generally, my point is that biologists need to be able to interface with many, many programs, and you can't expect canned interfaces to all of them to be available on CPAN. These programs are often UNIX command

                    • I am not sure I get your point entirely. I think there are 2 cases:

                      1) Where you'd like to interact with other unix processes via the pipe mechanism

                      2) Where you'd like a quick way to wrap C-libs which provide certain features.

                      1) Is something Perl excels at. I'd recommend you have a look at the Perl Cookbook, recipe 16.4 "Reading or Writing to Another Program". Or Gabor Szabo's Pipe module might do the trick:


                      2) Is more problematic since to be able to

                    • My point is that a non-lousy interface to SVM-Light needs to handle large datasets. Algorithm::SVMLight was clearly written by someone who never used SVMLight on a decent-sized dataset. Such a dataset will almost always contain hundreds of megabytes of data, and come from either (1) a text file you download or (2) a C or FORTRAN function you call.

                      I don't think specific cases will help here. Here's the general problem: I have one million labeled data points generated by some program, and I want to use th

                    • This is what the man page for Algorithm::SVMLight says:


                      An alternative to calling add_instance_i() for each instance is to organize a collection of training data into SVMLight's standard "example_file" format, then call this read_instances() method to import the data. Under the hood, this calls SVMLight's read_documents() C function. When it's convenient for you to organize the data in this manner, you may see speed improvements. "

                      The way I read this is that assuming your data ca

                    • by educated_foo (3106) on 2010.02.28 22:05 (#71736) Journal

                      How did I miss that? (Probably because the synopsis only used add_instance(), and I skimmed the rest too fast.) SVMLight format is pretty simple, so it's not too hard to dump your data in that format and then call read_instances(). So one minor suggestion -- adding instances in bulk, particularly for training, is far more common than adding them individually, so it should be in the synopsis.

                      FWIW, when I wrote an Octave binding to SVM-Light some years back, I used direct calls to the SVM-Light C interface (init_doc(), custom_kernel, etc.) to add a whole batch of instances. It was more work, but way more efficient (and flexible!) than serializing and going through the file system.

                    • To be frank I'd have thought so too. Useful SVM's based on a few instances can't be all that common.

                      I've a theory that one reason the re-use revolution promised by the OO evangelists never happened is that the effort and ingenuity involved to work out what s/w you can re-use can make it risky effort-wise to even try. I wouldn't say this is anyone's fault in particular - there's just too much information to wade through.

                      If I understand you correctly you had Octave generate batches of instances in the requ

                    • the effort and ingenuity involved to work out what s/w you can re-use can make it risky effort-wise to even try

                      I had much more hope for what people called "component-oriented programming" in the 90s -- large pieces of functionality with very simple interfaces. Small-grained objects are a symphony of fail.

                      If I understand you correctly you had Octave generate batches of instances in the required format and then pumped them directly into the SVM-Light engine?

                      Actually, I allowed 2-argument Octave functions a