The rise of Python in computational science

danbmil99 · on Dec 19, 2009

> Can we stop promote imperative-ish languages and focus on where lambda can take us?

nope, and why would we? For many applications, including the majority of scientific research, FP is a non-starter. You need a simple, robust, procedural language that thinks the way most of our brains are wired. There is no reason or payoff to wrapping your mind around an arcane style of programming when the software project is not going to need the benefits of that style of development.

xtho · on Dec 19, 2009

How can you say FP was a non-starter? Just because some aging environments were designed by Fortran hackers? FP is better suited for parallelization. And where would you put R?

cdavid · on Dec 19, 2009

(disclaimer, I am a Numpy/scipy dev)

I think the language is just part of it. I would go as far as saying that from a purely "technical" POV, lisp and ML (e.g. OCAML) are superior languages to python in almost every way. Nevertheless, I think those are, at least today, not as good choices as python for scientific research, for at least two reasons.

First, I strongly believe scientific work in general needs to be more open, both publication-wise and implementation-wise. I see programming languages as a communication tool as much as an implementation tool for science, and I think python fills this very well, because it is very readable to the casual programmer. LISP is too foreign for this audience. P. Norvig mentioned about LISP relative failure for non CS scientists this at his talk for scipy09 (http://www.archive.org/details/scipy09_day1_03-Peter_Norvig) - and I think you can count him as a LISP fan. To quote his talk, at some point, "you have to stop fighting reality".

Secondly, there is something about the LISPs, haskell and OCAML communities which do not pave well compared to python. Those are very CS-savies, really oriented toward programming - which is fine. Different goals, different tools and all that.

Also, do not forget that in science, projects generally last much longer than in most other areas. That's one of the reason why Fortran is still so pervasive, after all. It is also maintained by different people (grad students, etc...), involving several generations in some cases. Having a relatively mainstream language is a requirements - python is already too weird in many cases...

Concerning parallelization: even if you assertion were true, that's a concern only for a tiny proportion of what people do. Speed really does not matter most of the time (but it is true that when it does, it often does it in a significant way - high energy physics, climate modellization, etc...). I would be cautious about FP being better for parallelization for scientific work, though: a lot of tasks can be solved using MPI, etc... and correct me if I am wrong, but I don't believe FP brings a lot of advantages there.

Recently, a perspective from W. Stein, who started the SAGE project, was mentioned on LtU, and the article provides more insights, coming from a quite different background: http://lambda-the-ultimate.org/node/3712

graphene · on Dec 19, 2009

It's interesting that in Norvig's talk you linked, while praising python, he also mentions that he is a supporter of the functional programming style for scientific work, both because it more transparently mirrors the mathematical formulation of the algorithm, and because it should make the use of massively parallel computations easier. I concede that there are currently few implementations of massively parallel, functional code (MapReduce and possibly Data Parallel Haskell come to mind) and it's true that the MPI-based Single Program, Multiple Data paradigm is dominant for now, the functional paradigm being but a very promising newcomer.

Given that, and the fact that languages such as Lisp, Haskell and OCaml are more aimed at the functional paradigm, wouldn't you agree that there is possibly a general shift going on in scientific programming in general, from Fortran/C(++) imperative style, via python, towards finally any of Haskell, Lisp and/or OCaml?

I don't really agree with your implication that the (at first) obscure way of doing things functionally makes the work less open, and that this makes it less desirable. If everyone worked in exactly the same way, communication would be easier, but there would never be any disruptive change. I think letting everyone figure out for themselves what language works best for his/her appliciation has more merits than trying to enforce a standard, be it python + numpy & scipy, or anything else. If there turns out to be one clear winner, people will gravitate to that automatically, but you shouldn't argue for harmonization for its own sake.

chancho · on Dec 19, 2009

> and because it should make the use of massively parallel computations easier. I concede that there are currently few implementations of massively parallel, functional code (MapReduce and possibly Data Parallel Haskell come to mind)

Not only are there few, their numbers are growing much more slowly than traditional C/Fortan/MPI environments. The "massive" in massively parallel is getting more and more massive every year. Running computations at full speed at these scales, 100K+ cores in multiple levels of hierarchy (same die, same board, same blade, same switch, etc) is a hugely mundane architectural problem. Not only is there no architecture besides MPI which has nearly as much effort invested in this problem, none are currently making the investment, so none have a chance to catch up within the next decade. The only way you will see FP doing massive parallelism at speeds even close to current best is if they wrap the C MPI interface, and I have yet to see any functional message-passing APIs which aren't tarted-up imperative environments. (If anyone knows any please share.) At it's core, message passing programming involves managing data, deciding where it resides and where it needs to go, which is kind of antithetical to FP.

You mention MapReduce, which I think is more of a data-center thing subtly but fundamentally different from HPC, involving less computation and more I/O. FP has a good shot there (cf. Erlang) but not much chance at cracking the HPC market.

graphene · on Dec 19, 2009

The only way you will see FP doing massive parallelism at speeds even close to current best is if they wrap the C MPI interface, and I have yet to see any functional message-passing APIs which aren't tarted-up imperative environments

True, I would not be surprised if it evolved that way, but according to my (modest) knowlegde, it is possible for imperative routines to have a purely functional interface. The extreme case of course is the fact that any functional high-level code needs to be translated to assembly code to run at all, but I don't see why that boundary can't be on a higher level; As long as the developer programs functionally (and derives the benefits from that), does it matter that his code is being translated first into imperative MPI-ified C, and then assembly?

This is all provided, of course, that you can actually leverage the parallelism-related benefits of FP this way, which I guess is not known as of yet. You mentioned "functional message-passing APIs which [are] tarted-up imperative environments", care to give an example? I'm intrigued...

As for MapReduce, of course it's different from what the average HPC machine is used for, but being an embarrassingly parallel problem, it's one of the first problems where the FP approach has been shown to work. In other (eg typical HPC) applications, there's lots of work ahead in rethinking the algorithms so they can be expressed functionally, before you can even begin contemplating how to use the hardware.

sophacles · on Dec 19, 2009

I have always wanted:

A. an easy way to call into Haskell or some other fp from python.

B. the ability to embedd the above so that I can have equations in a nice mathy looking form but the rest in python.

C. all of the above w/ R too.

tetha · on Dec 19, 2009

This post is a very, very nice proof of my feeling, that FP is usually not used due to the hassle of getting data into the program in a functional language.

I mean, c'mon. Who was not confused by Haskells IO?

And now, compare this with "foo = input()", or "bar = scanf("%d");". Bam. All the benefit of nice functional notation for the algorithms superseded by the hassle of getting data from the outside into the program (or, the very small hassle (seen relatively) to do so in imperative languages.

est · on Dec 19, 2009

I hate to say this, but Python == VB for science.

jpr · on Dec 19, 2009

That would be funny if it wasn't so insulting for modern VB. Python is the oldschool BASIC for science.

morphir · on Dec 19, 2009

why is python preferred above scheme or any other lisp, like clojure? Can we stop promote imperative-ish languages and focus on where lambda can take us?

papaf · on Dec 19, 2009

I'm currently in a computational biology lab and people are allowed whatever suits them to tackle a problem. I'd say that there is an equal mix of octave, R, lisp, fortran, C++ and python.

I'm the heaviest user of python and this is because it allows me to get things done quickly (I need to concentrate on the science and not the joy of programming during office hours). Python is a great scripting language and that's how I use it. I find that I have to think too much when using Lisp on simple programming tasks and I'm being challenged enough by the maths or logic of the problem I'm working on.

That said, all genetic programming is done is Lisp because no other language comes close. The numerical performance of Lisp is also very good -- I'd say about 1/2 that of C++ when using SBCL. This speed is fast enough and impressive for a dynamically typed language.

jast · on Dec 19, 2009

All GP is done in Lisp?? Sorry but that is far from true! In the EC community C/C++ and Java are the most common languages used. Just take a look at the popular EC libs out there (EO, ECJ, OpenBeagle, TinyGP, etc). Even Python and Matlab are well more accepted as programming languages than any kind of Lisp. It's sad but it's true.

You don't know the "fight" that is to use Lisp in this field/community. The few times I use Lisp is for solo work, because every time I do collaborative work people simply refuse to even considered it. The myths against Lisp are pretty strong and people just refuse to change their minds. Which I think it's kind of ironic since in a science/research environment, people should be more open-minded. Oh well...

silentbicycle · on Dec 19, 2009

I think papaf means all GP done in his/her computational biology lab, specifically.

jast · on Dec 19, 2009

On a second reading, you're right. But the truth is that in the EC community, and GP in particular, Lisp is not used. 1/2 of C++ speed is just considered too slow. IMHO, I guess speed is just the easy excuse.

ubernostrum · on Dec 19, 2009

Because Python mixes:

* Easy to learn (including a REPL),

* Multi-paradigm programming (doesn't shove a One True Way To Program(TM) down your throat), and, most important of all,

* Solid, battle-tested libraries for the domain (SciPy/NumPy/Sage/etc.) and an easy-to-use interface to/from C.

hyperbovine · on Dec 19, 2009

Also, the code is extremely readable. I can't think of another language that communicates the programmer's intent better than Python. People often write code aesthetics off, but it really matters. When your audience includes not just fellow programmers, but also potential collaborators, who may or may not know Python, as well as your adviser, who knows only FORTRAN, having a language that self-documents is helpful.

xtho · on Dec 19, 2009

The French call it déformation professionelle, which works wonders when it comes to perceiving the weirdest stuff as perfectly normal.

morphir · on Dec 19, 2009

Yeah, I forgot - scheme does not have a repl... Come on! Multi paradigm? scheme is mp but encourages fp style. Such that side-effects are reduced to a bare minimum. And most lisps are mp. Even better - the object systems are many and modular with lisp. How many object systems are there with python? The problem with scheme is not with DSLs either, it is where most LISP shines like a rockstar! Building domain specific abstraction is trivial with scheme as long as you do a bottom-up approach. Scheme DO have FFI - not only that, scheme can compile to C. Can python do that? NO! And its not because of the prefix notation either. Sure it takes a little while to get used to applicative order of evaluation, but thats only because some idiot taught you the imperative way first. I used python before I got aware of scheme, and believe you me, I got a lot more educated on how to create applications when I watched SICP and read the little schemer books. Also, lets not forget how much time you spend doing memory management when doing C/C++. It's painful, and a waste of good time and money. Python has matured faster than scheme because it has focused on building libraries rather than being standard. Also, there is only one python implementation - where there is many schemes (most of them half-assed) and lisps. This makes it difficult to choose and land on a lisp that can do all and be all, and as such does not provide any traction for the language. There is one modern lisp that could change all this, and that is clojure. If clojure can provide all necessary libs that the working programmer need to have at hand we can see functional programming hit the mainstream of enterprise developement, research and whatever domain there is a need for computation.

gjm11 · on Dec 19, 2009

there is only one python implementation ... apart from IronPython, Stackless Python, PyPy, Jython, Unladen Swallow, and maybe a few more I've missed.

silentbicycle · on Dec 19, 2009

The more important point is that if you're starting out with Python, which one to use is very clear. There's Stackless and so on, but the main page at Python.org just points to CPython. (It does mentions that Python is available for the JVM and .NET vm, but that's not the primary link.)

It's a sharp contrast to when people have to decide which Lisp or Scheme to learn with, but don't have enough experience to choose. (Some probably decide on Python instead.)

morphir · on Dec 19, 2009

exactly my point. Unfortunate for us schemers that we don't have such a mainstream implementation. Also, it would be cool if linux distributions could ship with a minimal scheme. Modules could be downloaded & installed as we need them.

morphir · on Dec 19, 2009

sure, but neither of them are shipped with the operating systems that I use (linux and osx).

cdavid · on Dec 19, 2009

neither is scheme AFAIK, at least on OS X.

But to be fair, none of those implementations are usable for most scientific work, as numpy is only available on CPython, at least today.

jbjohns · on Dec 19, 2009

Actually, lisp matches all of those as well. It might not be the easiest to learn coming from C, but otherwise it's not bad.

samdk · on Dec 19, 2009

Lisp is significantly harder to learn for most people--especially those people who are programming only as a means to an end and not for fun.

Lisp goes significantly farther towards shoving FP down your throat than Python does any individual paradigm.

Lisp does not (to the best of my knowledge) have anything to match scipy/numpy.

Now, this doesn't mean that I wouldn't prefer if the scientific community were dominated by FP instead of Fortran and Python, but I think it's very easy for people like us to go and say "FP is easy" without realizing that for most people it's really not.

morphir · on Dec 19, 2009

No its not 'harder' to learn. You just have to understand the concept of applicative evaluation and prefix notation. Then recursion and higher-order functions. That is four things. (Macros can be considered advanced). In the end, all functions evaluates to a value. Thus functions are considered to be data. This value (data) is then passed on to the next function as arguments, and so on.. Anonymous functions is also something which kicks ass, and you miss such features when programming in C. This is one reason why C programs should be left as simple as possible, with as little abstraction as possible. (Fortran? Come on!)

samdk · on Dec 19, 2009

Recursion is very hard for most people. Lisp is quite simple, yes, but FP is an entirely different style of programming that takes a lot of people time to learn and get used to.

jbjohns · on Dec 19, 2009

>Lisp goes significantly farther towards shoving FP down your throat than Python does any individual paradigm.

This is simply not the case. It has the best OO system in existence. It has the loop macro (mini language). In actual practice I think lispers tend to not program so functionally because they're instinctively afraid of consing.

klipt · on Dec 19, 2009

There is actually a version of lisp geared towards numerics/science: http://lush.sourceforge.net/index.html

However, looking at the examples it seems to be used imperatively anyway (it even lets you include C code inline): http://lush.sourceforge.net/screenshots.html

TriinT · on Dec 19, 2009

Python is very easy to learn, very intuitive. For someone who has done numerical computation with MATLAB, Python is a very natural language to use. By contrast, LISP is hard to learn, unintuitive, and unnatural.

In the real world people care about obtaining results as fast and painlessly as possible. The language is just a tool, not a goal.