HPC administrator, researcher and scientific software developer here. Inertia is...

denisrosset · on April 7, 2022

> The culture and dynamics around scientific and academic programming is different from both FOSS and commercial software. This needs to be taken into account.

I'm currently teaching modern scientific Python programming, if that's a thing.

You raise an excellent point, the software development life cycle is quite different in academia. Do you have resources in mind on that topic?

I was thinking of dividing academic code in 3 "Ps":

- Playground: throw-away scripts for data analysis, fast moving code where any rigidity slows the process. Internal use by a single researcher usually.

- Prototype: code that serves as a foundation to the above, or is reused frequently, but is otherwise not shared outside a research group.

- Product: code, often open source, that is shared with the community.

Most tutorial about "good coding practices" do not make distinctions between these three stages.

For Python, there are practices that work at any level: auto-formatting code and sorting imports, gradual typing. But things like tests, API documentation, dependency management only apply to steps 2 and/or 3.

yiyus · on April 7, 2022

Stage 3 does not always happen. There are a few remarkable counterexamples, but most research software does not usually finish taking the form a product.

In my experience, what happens is that some prototype works quite well and gets used in the next project. But the next project is building a new prototype, not improving the previous one. After a few projects, the person who wrote that first prototype already left the department, but there are people who depend on it to get results. If enough time passes, you will have some piece of legacy code that very few people can understand.

The problem with academia is that, while the prototype is an excellent topic for an article, the real product, a maintained (and maintainable) software project, is not. And getting funding in academia to do something of no academic interest (ie. that cannot be turned into a paper) is very difficult. If the software is of no interest to some industrial partner willing to spend money, that code will just keep moving between the USB drives of PhD students, probably branching into slightly different versions, until someone is brave enough to start working in a new prototype and the cycle starts again.

sam_bristow · on April 7, 2022

Maybe add a fourth category.

- Petrified: code that's set in stone and everyone's to scared to touch in because "it works, but God knows how".

zozbot234 · on April 7, 2022

With the increasing expectation that academic research results should be reproducible, there's really no such thing as throwaway "playground" or "prototype" code. Everything is potentially a "product".

NGRhodes · on April 7, 2022

Not only are software development life cycles different, but also the attitude to development. Which is not a surprise when in most cases, the output is not the software (or system), but research or learning; the code is merely a side effect.

hilbert42 · on April 7, 2022

Very perceptive comment, especially the last paragraph.

noobermin · on April 7, 2022

May be you don't realize it but still, the hidden hypothesis in your argument is once again "new/current is better." You say "mindset[s] change," again, that just comes with developing software in general. Your mindset changes when you make a web app, a system tool (like a driver), or scientific code. I have a different mindset when I knead the 30 some odd year old PIC code I have to manipulate every now and then than the few times I played around in js for personal projects and it's a different mindset when I write 6502 asm for another. The thing is you say a "different mindset" that is different from what is familiar to someone new to a codebase is an argument for why "inertia is an issue." There is an obvious and different solution to "rewrite the code" and it is "re-learn the code/old design." The only reason you opt for "rewrite the code" here I can imagine is some hidden bias towards a modern mindset.

I understand how code bases change and warp but it really is a push and pull for when you should abandon something vs. keeping it around. Moreover, the other alternative, learning to actually use old code and understand it in the mindset it was developed avoids the frankenstein-ization you refer to becase if people actually understood the old code, they can add to it in a way that meshes well with the existing code rather than it being a bolt-on. That said, I can understand if you inherit something that already has the bolt-ons such that you're not really responsible for that, and that can be harry, but really I don't really feel like that is something unique to computational science in the abstract. Bolt-ons are common across CS I feel like.

The main thing I am railing against and have been doing for a long time is the tendency for developers to have more on an emphasis on writing code as opposed to reading code that already works. In particular, taking time to understand so-called legacy code, learning to think in the way it was written, then modifying said code in a way idiosyncratic to it. Unironically, we focus way too much on creativity in CS. That sounds a bit funny but the fact it does already demonstrates the reality of that mindset's (ironically) strangehold on software in general. It's really funny because creativity is actually not that important for the majority of users of computers but is very much valued by developers in general because they develop computers for a living. On the other hand, something that works and is stable is actually something people don't even know they want but even better (or worse), they rely on or at least grow accustomed to given they bitch and moan once the familiar is broken often to fill some need of some developer to chase the new and shiny.

This is a long comment, but there is one last thing I'll touch on: one of the places where I do agree somewhat is new technology. For example, for people in modeling laser-plasmas (where I hail), people have still not really adopted GPUs even though that's all the rage (and has been for years already actually), because the main tools (PIC and MHD) do not map well to GPUs and the algorithms were developed with a large memory space accessible across a node assumed. There are efforts being made now but it's still considered hot shit for the most part. So, there is one place I'll grant you that it does require some willingness to "write more" so to speak, to be able to take advantage of new technologies. That said, "writing more" in this case still requires rather deep knowledge of the old codes and particularly why they made the choices they did in order to save yourself a few years of recreating the same mistakes they did (which btw, is literally what I see whenever people do attempt that sort of thing today).