NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Re: Why Ruby is prettier and Padre changes the Per (Score:1)
I've used perl for:
Data munging
GUI's
Real-time, high availability, telecomms apps
One-off parsers many & various
Simulations & customer demos
Web-services & clients
Code-generation
Fun
& so on
Perl is admirably fit for any soft-dev task you have in mind, subject to performance constraints in certain situations. It may not be a *perfect* fit in all contexts but then again expectation to the contrary is probably not reasonable.
As for supposedly not being fit for
Re: (Score:2)
The Strawberry team is going to bundle BioPerl in the default install of Strawberry Professional, which will also come with Padre pre-installed.
So we can offer than a zero-effort bootstrap into a BioPerl environment.
FYI (Score:1)
Speaking from experience in both academia and bioinformatics, BioPerl is the opposite of a selling point; it's over-engineered and half-implemented, almost always more trouble than it's worth. Perl is attractive for its text processing and system scripting. If you want to make Perl more attractive to biologists, make it easier to interface with C and R (e.g. via Inline::).
Reply to This
Parent
Re: (Score:1)
Re: (Score:1)
SVM-Light and the NCBI tools come to mind.
Re: (Score:1)
http://search.cpan.org/~kwilliams/Algorithm-SVMLight/lib/Algorithm/SVMLight.pm
There are also perl & python wrappers for the NCBI tools:
http://www.bioinformatics.org/forums/forum.php?forum_id=6735
Given the extensible nature of Padre I'd imagine a customised version that ships these & presents dialogue boxes, tree views etc to interface them, wouldn't present an insurmountable challenge.
Re: (Score:1)
That's completely useless for large datasets:
C is a good least common denominator, so it helps to make a scripting language's interface to C as painless as possible. XS is hardly "painless," so there's room for someone to create such an interface.
Re: (Score:1)
read_instances($file)
?
What other features would you like to see in the interface? It may well be possible to wrap/sub-class Algorithm::SVMLight to tweak the interface.
If you have anything specific in mind I'd guess the author of the module would be happy to have a look.
Interface (Score:1)
A function to read data from a TSV/CSV file (in C, without going through Perl) would be extremely useful; ideally,
$filewould be a file handle rather than a name, so I could pipe it from standard input. A function to operate on adouble**generated by some other C library would also be useful.More generally, my point is that biologists need to be able to interface with many, many programs, and you can't expect canned interfaces to all of them to be available on CPAN. These programs are often UNIX command
Re: (Score:1)
1) Where you'd like to interact with other unix processes via the pipe mechanism
2) Where you'd like a quick way to wrap C-libs which provide certain features.
1) Is something Perl excels at. I'd recommend you have a look at the Perl Cookbook, recipe 16.4 "Reading or Writing to Another Program". Or Gabor Szabo's Pipe module might do the trick:
http://search.cpan.org/~szabgab/Pipe-0.03/lib/Pipe.pm [cpan.org]
2) Is more problematic since to be able to
Re: (Score:1)
My point is that a non-lousy interface to SVM-Light needs to handle large datasets. Algorithm::SVMLight was clearly written by someone who never used SVMLight on a decent-sized dataset. Such a dataset will almost always contain hundreds of megabytes of data, and come from either (1) a text file you download or (2) a C or FORTRAN function you call.
I don't think specific cases will help here. Here's the general problem: I have one million labeled data points generated by some program, and I want to use th
Re: (Score:1)
This is what the man page for Algorithm::SVMLight says:
"read_instances($file)
An alternative to calling add_instance_i() for each instance is to organize a collection of training data into SVMLight's standard "example_file" format, then call this read_instances() method to import the data. Under the hood, this calls SVMLight's read_documents() C function. When it's convenient for you to organize the data in this manner, you may see speed improvements. "
The way I read this is that assuming your data ca
*facepalm* (Score:1)
How did I miss that? (Probably because the synopsis only used add_instance(), and I skimmed the rest too fast.) SVMLight format is pretty simple, so it's not too hard to dump your data in that format and then call read_instances(). So one minor suggestion -- adding instances in bulk, particularly for training, is far more common than adding them individually, so it should be in the synopsis.
FWIW, when I wrote an Octave binding to SVM-Light some years back, I used direct calls to the SVM-Light C interface
Re: (Score:1)
To be frank I'd have thought so too. Useful SVM's based on a few instances can't be all that common.
I've a theory that one reason the re-use revolution promised by the OO evangelists never happened is that the effort and ingenuity involved to work out what s/w you can re-use can make it risky effort-wise to even try. I wouldn't say this is anyone's fault in particular - there's just too much information to wade through.
If I understand you correctly you had Octave generate batches of instances in the requ
Re: (Score:1)
I had much more hope for what people called "component-oriented programming" in the 90s -- large pieces of functionality with very simple interfaces. Small-grained objects are a symphony of fail.
Actually, I allowed 2-argument Octave functions a