I would say the number of biologists who actually *understand* programming is ex...

gundmc · on Aug 6, 2020

To be fair, linear code is often totally sufficient for most types of data analysis. Biologists don't really need to understand design patterns or polymorphism, they just need to not make computational mistakes when transforming the data.

shpongled · on Aug 6, 2020

Absolutely. My point was more than you can't expect comp. biologists to actually be "good" programmers when compared to SWE or even web devs.

Most of the code I write to do biological data analysis is fairly linear. However, I also generally use a static type system and modularity to help ensure correctness.

I've perused a lot of code written by scientists, and they could certainly learn to use functions, descriptively name variables, use type systems and just aspire to write better code. I just saw a paper published in Science had to issue a revision because they found a bug in their analysis code after publication that changed most of their downstream analysis.

oneplane · on Aug 6, 2020

It does get rather problematic when you have large quantities of...stuff. You can't run linear stuff in parallel so now you're bound to whatever single CPU core you have lying around.

I'd say that getting some basic data science computing skills should be more important than the silly SPSS courses they hand out. Once you have at least baseline Jupyter (or Databricks) skills you suddenly have the possibility to do actual high performance work instead of grinding for gruntwork. But at that point the question becomes: do the people involved even want that.

sumtechguy · on Aug 6, 2020

I write 'one off' programs all the time. Most of what I write I throw away, and I program for a living. Those are usually fairly linear. Which is fine. If I am writing something that will be re-used in 6 different ways and on a 5 person team. That is when you get out the programming methodologies. It is usually fairly obvious to me when you need to do it. For someone who does not do it all the time. They may not know 'hey stop you have a code mess'.

It one of the reasons why people end up with spreadsheets. Most of their data is giant tables of data. Excel does very well at that. It has a built in programming language that is not great but not totally terrible either. Sometimes all you need is a graph of a particular type. Paste the data in, highlight what you want, use the built in graph tools. No real coding needed. It is also a tool that is easy to mismanage if you do not know the quirks of its math.