Hacker News new | past | comments | ask | show | jobs | submit login
Successfully collaborating with computational biologists (stactivist.com)
54 points by azuajef on Aug 13, 2016 | hide | past | favorite | 22 comments



Interestingly, some of the first interdisciplinary collaborations on Overleaf[1] were between computational biologists (who tended to write up their papers in LaTeX) working with non-computational biologists (who tended to use Word), helped partly by the Rich Text mode we built[2] which hides (most of) the LaTeX code for those who prefer to edit in a more WYSIWYG-style environment.

We continue to see strong use in this area today, and indeed there seems to be a growing trend for interdisciplinary collaborations (I'm one of the founders, and we started it very much as a side project between mathematicians needing to collaborate in LaTeX!).

It's also great to see that some of the newest innovations in publishing platforms are in the (computational) life sciences area too, e.g. F1000Research[3] (which offers open, post-publication peer review) and PeerJ[4] (which has a membership model but no fee to publish).

[1] https://www.overleaf.com/

[2] https://www.overleaf.com/blog/81

[3] http://f1000research.com/

[4] https://peerj.com/


What is the most impactful way one can contribute to the biological research effort from distance as a programmer with some basic university-level biological education? What open source project used in the area is both used frequently and lacking in some important aspects?

I noted there were some bioinformatics project in the latest Summer of Code, but they seemed to me more like "let's think how to teach our undergrads work ethics" than "please, we really need this".


These guys need all the help they can get

https://summerofcode.withgoogle.com/organizations/5111396454...

it's a Java backend with massive D3 frontend and import tools in Python. There is a lot of issues at https://github.com/cBioPortal/cbioportal

And contributing is really smooth, the core developers reply fast and are easy-going. Your github looks like you're interested in testing and that's one area cbioportal is quite lacking. Might be worth a shot.

EDIT: Ozvi se, kdybys potreboval pomoc ;)


And if you want to work on cBioPortal and make it your job consider joining us at The Hyve (thehyve.nl) (or one of the other institutions if you're not in the Netherlands)


Galaxy could use the same.



Why from a distance?

We're building a framework for analyzing massive genetic on Spark:

https://github.com/broadinstitute/hail

I'm a software engineer/mathematician, but we're embedded in a world-class genetics research lab. The first paper using Hail was put out recently:

http://biorxiv.org/content/early/2016/06/06/050195

with more in the pipeline. Hail is being used to analyze some of the largest genetic datasets out there (hundreds of thousands of exomes and tens of thousands of whole genomes). There's tons to do. Jump in or email me (see my profile) if you'd like to get involved.


It depends how you want to contribute.

If you want to offer your services as a programmer to the biology community, there are several open source projects out there to develop LIMS (lab information systems), user friendly interfaces to many common analysis tools, and web portals for various research communities to share biological data. I wouldn't consider any of this to be 'computational biology', however.

It's a common misconception that computational biology (or bioinformatics, or whatever you want to call it) = biology + software development. While it's true that the biological research needs more software developers to build the infrastructure necessary to manage and access a growing amount of biological data, that's not the role of computational biologists.

The computational biologist is just a biologist who uses computational methods as opposed to traditional laboratory methods. Some may not even write that much code, and the code they do write may not even equate to 'software'. The skill set involves algorithms, statistics, and AI much more than software engineering.


Image analysis [1] [2]. Statistical packages. Rewriting old, but powerful algorithms into modern languages. APIs to access databases such as Entrez, Uniprot, etc. Stupid parsers to parse still-used god-awful ancient data-formats into standard JSON/XML (looking at you genbank and pdb).

Here are a couple of great biological scientific topics that are used by a lot of researchers that could always use some help if you find one that is active and to your liking:

- https://github.com/search?utf8=%E2%9C%93&q=ImageJ

- https://github.com/search?utf8=%E2%9C%93&q=Entrez&type=Repos...

- https://github.com/search?utf8=%E2%9C%93&q=Uniprot&type=Repo...

- https://github.com/search?utf8=%E2%9C%93&q=particle+tracking...

- https://github.com/search?utf8=%E2%9C%93&q=genbank&type=Repo...

[1] http://fiji.sc/

[2] http://bigwww.epfl.ch/sage/software.html?topic=bio


I despise the computational/biologist split, where biologists know next to nothing and can't even sort a spreadsheet. These days the bar to being a "computational biologist" is super-low. (1) Can you login to a unix machine and write a bash script? (2) Do you know what 'variance' is? You're good to go! Because 99% of biologists don't know either of these things and don't care to.

The number one thing that most biologists need (that most scientists need) is a basic understanding of statistics when doing experimental design. Sure, you can get this by having a computational biologist on staff, always timidly raising their voice to get you to add replicates, for fuck's sake. But - how about learning this stuff yourself, so your experiments (which rely deeply on statistical reasoning) aren't shit to begin with?

My goal as a computational biologist isn't to do work for or correct biologists who don't understand statistics, it's to get biologists tools to do this kind of work easily on their own and bridge the gap in understanding. At the end of the day, computational biologists can't have the important domain-specific insights; biologists need to start understanding the datasets better themselves so they can apply their domain knowledge to it, and computational biologists need to be making tools to help them get there. Interdisciplinary work succeeds best by constructing interdisciplinarians who have all of the skills.


I agree with your overall point and particularly the first part. Computational biologists and laboratory biologists are both biologists, just with different tool sets.

I disagree, however, with your last paragraph - in particular the notion of computational biologists as tool builders. You're absolutely right that lab biologists need to get with the times and become more computer-literate and data-literate. It's not the responsibility of computational biologists to shepherd them though, they have their own research to do. I think the divide that you point to exists largely because the field is still predominantly occupied by either old school wet bench biologists who won't learn new things (and more disturbingly don't think their grad students need to learn any either), and computer scientists / statisticians who migrated into the field from elsewhere some decades ago and still retain a sense of being outsiders. The next generations of biologists, however, should increasingly resemble a true merging of these two groups.

No tool can compensate for an inability to understand statistically valid experimental design or how to manipulate genomic data, and a laboratory biologist who generates this data without being able to do these things isn't much of a biologist. Likewise a computational biologist that doesn't have those important domain-specific insights might not actually be much of a biologist either.


I think making tools is a great thing to do as a computational biologist... But I feel the main goal of a computational biologist is not to just make tools but to make discoveries in ways that biologists could not have even dreamed of. I hope one day this field of computational biology can stop calling themselves computational biologists and simply call themselves biologists.

I also find sometimes the inverse is true: computational biologists who don't know any biology. I think there is work on both sides to bridge the gap.


> I hope one day this field of computational biology can stop calling themselves computational biologists and simply call themselves biologists.

Absolutely.


Agreed. You'd probably enjoy this piece by Sean Eddy: http://journals.plos.org/ploscompbiol/article?id=10.1371%2Fj...


Why do you want to invest that much energy into trying to help a field that doesn't care to understand your work and ignores its warnings/conclusions?


It is my field; my training is as a biologist, and I love the science. Plus my frustration indicates a problem worth solving.


Depends on where you are. At my company you generally need a Phd in something like CS, Math or Physics to even be considered.


You make it sound like that's a difficult thing to acquire.

But, as my dissertation adviser pointed out, "[apathy], you have to remember that Really Stupid People get PhDs all the time. Don't let me down."


Also, I had a discussion with a theoretical physicist recently where we spent almost half an hour bemoaning the myriad ways in which experimental design was ignored in our (hard science) undergrad programs, as well as many if not most graduate programs. It's sad.


It's above my intellect level. You miss the point however, I was responding to a point that biologists we're finding their way into comp bio roles without computational skills


It's not above your intellect. That's the whole point. Credentials are mostly bullshit unless accompanied by primary evidence. Think for yourself first. If necessary, and only if necessary, feel free to use credentials as potential tiebreakers.

Having students (particularly mathematicians) has reminded me that only derivation and demonstration really count in science. All the rest is artifice.


Is there any collaboration for which these insightful realizations don't hold true? This is great advice for the joining of any two fields or parts of a process (in particular I'm thinking Design and Engineering are quite similar in analogy).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: