I guess my biggest question is: in what way can programmers outside that specifi...

hirenj · on July 27, 2011

Bioinformatician here: First of all, I think it's got to be clear that a lone bioinformatician, or even a group of them isn't going to go about changing the world. Essentially, when you sign up for this, you're still a cog in a machine, albeit a slightly more altruistic machine.

Here's my pet peeve in bioinformatics - If there's one thing that's poorly suited to science, it's the building of computational infrastructure. We're talking basic stuff like databases, tools etc. Sure, anyone can knock out a bit of code for a basic database, but the big problem is that there's no incentive to have decent code, or maintain it so that it lasts any longer than the person is in the lab, or has funding. So, what will be great is if existing resources are cleaned up - data is normalised and pulled out so that it is actually accessible for doing some kind of analysis on it.

If you want to do bigger work, do something actually novel, or that has any biological relevance there's no getting around collecting your own data (e.g. sequencing the crap out of a bunch of things). I'm in the process of trying to get funding now for a project of mine to make that very leap now.

I'm sure someone working on next gen sequencing (the new hotness) can pipe up with the big problems to be solved there.

enjalot · on July 27, 2011

I know a professor who has a big grant to sequence a whole bunch of animals, and will also do a high res CT scan of each specimen. He plans to make a comprehensive site where scientists and school children can access the data they are interested and learn more.

He even has money for a dev position for 4 years, I'm just worried that he gets someone who slaps together a proprietary and incompatible site when this would be a perfect chance to experiment with implementing some standard data access APIs.

I've heard bioinformatics people complain about the lack of standards and fragmented nature that comes from various small groups of scientists doing it on their own.

If anyone is interested pm me and I'll put you in touch, he is in South Carolina so he can't offer the salary and other perks of the Bay, but it's a real chance to put good development energy into science.

polyfractal · on July 29, 2011

This is a problem that is pretty endemic to acadamia in general. The revolving door of most academic labs means a lot of knowledge and tools are lost or not maintained. It is frustrating to see this happen in every single lab, but short of fixing the labor-mill mentality of acadamia, this will never go away.

nkassis · on July 27, 2011

You don't always have to get new data to make a splash. I particularly like this project (it's in neuroscience but similar things probably exist in biology)

http://neurosynth.org/

They are doing meta-analysis of the neuroscience literature.

nkassis · on July 27, 2011

I fit the mold you are talking about. I started working for a neuroimaging lab about 1 year and a half ago. Basically I'm not a scientist at all, did Math as an undergrad but almost no stats or any of that type of vodoo stuff (topology, abstract algebra etc real math stuff is what I did ;p).

So basically I had no clue what Neuroimaging meant or did other than people get shoved in scanner, huge magnets turn and they see inside you ;p

But what I found is there are plenty of computer science problems in a field of Neuroimaging (and neuroscience) that a programmer can help with. (processing, image analysis, data mining, storage, Visualization whatever) Most labs don't really have people who's primary job is programming. Thus there a lots of tools that are just hack jobs long forgotten that no one is maintaining but everyone depends on. Those things can be helped out by good programming practice and with real programmers behind them. What sucks is getting funding for these people but Open Source can help here by pooling multiple people from many labs into common projects.

Also, if you do work with scientist, most of them will talk to you for hours about the science of what they do. You can usually ask the most stupid question and they will be happy to answer it. I've found that most people I work with are open and even more if the work you do helps them achieve their scientific goals. So in the end if you need to learn some science stuff, they will usually be helpful.

(BTW my project is in my profile, will be open source soon, waiting for some political approval process).

davi · on July 27, 2011

There is a new field of extracting wiring diagrams from brains at the level of individual neurons. This is what I work on.

http://openconnectomeproject.org has some introductory material. (This is not my site, it is by some people at JHU who picked up our data set and are working on it.) Massive image data sets, lots of need to develop workflow. You can browse the image data here: http://openconnectomeproject.org/catmaid/?pid=4&zp=40635...

Concretely, look at the plugins being developed by the Fiji project and pitch in, especially on the electron-microscopy-centric plugins: http://pacific.mpi-cbg.de/wiki/index.php/Category:Plugins

edit: also think about contributing to the CATMAID project, which is the software serving the browsable data set above, and perhaps will someday enable crowd-sourced markup: http://fly.mpi-cbg.de/~saalfeld/catmaid/

tom_b · on July 27, 2011

My gut feeling is that there is a large need for good query/visualization tools of the datasets the sequencers produce. At least, I think if researchers could "play" interactively with data they would be pretty excited. But I am most definitely non-expert in this area, so take that with a grain of salt. I sometimes think about whether or not tools developed with column stores (e.g., the programming language J or something like KDB+) would actually be cool for data exploration.

For visualization, check out:

http://genome.ucsc.edu/

Also, a huge list of projects is at:

http://en.wikipedia.org/wiki/Genome_browser

In the genomics analysis space, it seems that I hear these three tools mentioned for sequence alignment are tophat, BWA, and MapSplice.

http://bio-bwa.sourceforge.net/

http://www.netlab.uky.edu/p/bioinfo/MapSplice

http://tophat.cbcb.umd.edu/

These are actively maintained projects that I think are mostly developed inside of various academic research groups.

There is also The Cancer Genome Atlas project at:

http://cancergenome.nih.gov/

You can probably find research groups via TCGA that might appreciate some one-off development or support, but it might not be exciting from a tech viewpoint.

There is a ton of EMR (electronic medical record) data out there in free text. If you have skills or interest in things like Lucene/Solr, I would bet that almost any research hospital might appreciate your time and skills. And, if you talk to the right group, want to hire you . . .