Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

gnat (29)

gnat
  (email not shown publicly)

Journal of gnat (29)

Monday August 13, 2001
04:31 PM

Bioinformatics

[ #659 ]
I'm really jazzed about this. I never really took to biology in school (make stinkbombs in chemistry, lightning in physics, and ... ooze in biology?) so dropped it as soon as I could. But Jon Orwant has got me interested in it, and now I can't stop reading about it. I'm somewhat hampered because the great textbook I ordered from half.com still isn't here, so I'm just reading Stuff On The Net. But it's still interesting.

It's interesting because it's Real Science (discovering how things work) and detective work in the one. They make assumptions about what's important and what's not, they gather data on the important stuff, then try to make sense of it. At every turn there are unanswered questions and lots of data to turn into information.

They really need programmers. Lots of programmers. I mean, they have programmers, some good ones too--experts in distributed computation, string searching, and so on. But almost every biology generates a crapload of data and has to match it against known sequences, making allowances for known variations. They have huge overlapping dirty databases, varying in quality and data format, and the results of any search require a lot of interpretation. An unsolved problem in the field is merging annotations (comments like "this causes cancer" and "mmm, cheese for lunch today") across databases.

Everywhere I turn, I think "I could do that!" I haven't been this interested in something for years. And the best part is: They Use Perl! .

But if you want to get into this, you're going to have to learn a lot of molecular biology. It's like if you were asked to write a payroll system--you'd have to learn a lot about company financials. If you want to write a protein sequencer, you will have to learn a lot about proteins. For instance, one of the biggest unsolved problems in the field is determining the 3d structure of a protein (long molecule made up of discrete components, amino acids) given the sequence of the amino acids in it. It sounds easy, but you wouldn't believe how hard it gets when you talk about atomic charge and ionic interactions and all that bollocks. This is important, by the way, because the 3d shape of a protein is thought to decide what it does--what it bonds with, what it repels, and so on.

So I'm learning some molecular biology. I have friends I bug, and I'm planning a trip around the local university's lab. But mostly I'm reading things I've found on the net. I'm also discovering lots of opportunities for web sites that I'm relaying to the O'Reilly Network people. Years ago I would have started the website myself, but I don't have hours in the day for my family now, let alone new web sites. Ugh.

And I'm also getting lots of ideas for our Bioinformatics Conference in February. I want there to be at least one "All you ever needed to know about molecular biology to work here" type of class. I want that, because I need it! I'm not officially one of the conference organizers (after seeing me combust at OScon, they're trying to limit my workload) so now I'm just all opinions and no responsibility! Woo!

This is, of course, my way of avoiding the harder work on my plate--editing. I've got two chapters of a new Perl book (sorry, can't say what it is) that are really well written, but they're just not in the O'Reilly style. They read like Microsoft Press or SAMS. I'm trying to work out the transformation that will turn them into O'Reilly chapters, but I can't put my finger on the missing ingredient. I think I may tap a more experienced editor for some advice here. That's the way, Nat, punt the harder stuff to someone else! :-)

Until later,

--Nat