Math Breakthrough Speeds Supercomputer Simulations

boulos · on Oct 19, 2019

I only read the abstract from the most recent linked article [1], but it sounds like the results are statistically correct. For this field, that makes a lot of sense, rather than computing strictly “accurate” velocity/acceleration from finite differences (see [2] for the basics on this integration method / domain).

Edit (to make my comment clearer): they came up with a method to calculate the velocities they need for solving these equations with a new closed-form solution that doesn’t depend on the timestep size. This lets the author combine the new calculation with previous work to take big steps, yet still converge.

[1] https://www.tandfonline.com/doi/figure/10.1080/00268976.2019... which is actually also https://arxiv.org/abs/1909.04380

[2] https://en.m.wikipedia.org/wiki/Verlet_integration

mjsaah · on Oct 19, 2019

anybody got a few sentence explanation of the new method? I worked in a lab that did MD in college, so I'm kinda curious.

comicjk · on Oct 20, 2019

They're adjusting the integration method (used to calculate positions given forces) to take into account the properties of the thermostat (used to maintain a roughly-constant temperature in the simulation). This allows bigger timesteps.

In particular, they studied the Langevin thermostat, in which all the atoms are subject to small random forces that smooth out their average temperature. By adding a term to the integrator that includes the magnitude of the random forces, it is possible to widen the stability bounds of the integration.

The caveat is that the proof is only for simple potential energy surfaces. They haven't given a lot of evidence, even empirically, that this works for the potential energy surfaces we really care about, like protein binding calculations. We already have many empirical tricks for increasing timesteps for these simulations, like freezing bond lengths for hydrogen. Any new method has to beat these empirical methods, not just beat a naive approach.

l33tman · on Oct 20, 2019

Yeah I've worked a lot with this and was also curious as the thermo distribution is just one part, you also have a lot of short range forces that explode if you increase the timestep by too much and I didn't see any tricks around that (except the usual like freezing the fastest degrees of freedom, completely removing hydrogens with virtual sites etc).

cwingrav · on Oct 20, 2019

This is a great example of why someone else should write the high level description for somebody else's work. Without the baggage of what was hard, fun, easy, interesting (and the desire to sell/market), an outsider can focus on the importance and utility of a work.

Thanks for the description.

marmaduke · on Oct 19, 2019

The article doesn't seem to say what achieved the speed up. If they take a larger time step, they've used a special feature of molecular dynamics or some other assumption to achieve that ('no free lunch' or some such)

garganzol · on Oct 20, 2019

I presume that instead of calculating each and every "point" and "particle" in discreet form, they find the statistically correct result instead.

Probably something similar to Vitter algorithm https://getkerf.wordpress.com/2016/03/30/the-best-algorithm-...

bonoboTP · on Oct 20, 2019

It doesn't say anything concrete because its purpose isn't to inform the reader. This is a PR marketing press release by the university advertising their impact.

amelius · on Oct 19, 2019

> three or four times the performance

Back in my time computer scientists were'nt typically impressed much by a constant reduction of compute complexity. Has this attitude changed somehow?

gus_massa · on Oct 19, 2019

I work in a related area. We have algorithms that run in O(N^9) time or worse, so we can use only molecules with 5-10 atoms. In each atom you must consider about 5 orbitals, so the run time is more like O((5N)^9) if you allow me to abuse the notation. We have other algorithms, some run in O(N^12), some O(N^4), some ... well you can trade physicals accuracy for computational complexity. Once we got a x20 and we were very happy. For us, it means 1 or 2 more atoms in the best case.

Back to their work, they are probably using some algorithm that runs in O(N) because the images have a lot of atoms and anything that is not linear is too slow. In some cases the O(N) algorithm reach a good enough result, but sometimes it has a few hidden approximations and restrictions. It's very difficult to do better than O(N) in this systems, so you can't change the asymptotic complexity. So you only can change the big constant that is hidden by the O. A x3 is nice, it means 3 times more atoms.

comicjk · on Oct 20, 2019

Molecular dynamics simulations are usually O(N log(N)), because that's the time complexity of the Fast Fourier Transform, which allows computation of the N^2 charge interactions in less than N^2 time (e.g. Ewald Summation https://en.wikipedia.org/wiki/Ewald_summation).

For short-range interactions like Van der Waals forces, MD is O(N) like you said, because you only have to evaluate a finite radius around each atom. But charge interactions, decaying as 1/r, don't converge in any finite radius.

skunkworker · on Oct 19, 2019

I've heard similar things from friends in the academic chemistry world about the resources and memory necessary to perform calculations. It's insane just how much is necessary to simulate such a small amount of matter.

quotemstr · on Oct 19, 2019

What drove that large exponent?

gus_massa · on Oct 20, 2019

The short answer is in the other comment. It's an exponential problem that with some simplifications can be quite well approximated by a polinomial problem.

About our calculation, with a lot of simplifications ...

You want to know the density of the probability to find an electron in any point is the space near a set of a few atoms.

Instead of using calculating the density as a function of (x,y,z) you can calculate it as a sum of the orbitals that are associated to each atom. Assume 5 orbital per atom like Carbon or Oxygen, and 1 orbital per Hydrogen.

The orbitals may be combined. When you do it by hand, the most common case is called "hybridization", but the computer can choose stranger combinations. It is just an orthonormal base of L^2.

One way to describe the density is to use a matrix of NxN (or 5Nx5N, or whatever is the actual number of orbitals). The diagonal terms d_i^i are the probability that an electron is in the orbital number i. The rest of the matrix is more difficult to define with handwaving, but they are the correct numbers that allow you to change the base of orbitals and get the correct densities in the new base of orbital using the standard matrix formula C M C^-1.

The numbers may be complex number, but with some standard trick you can assume that they are real. They may be positive, zero or negative anyway. Except in the diagonal, because in the diagonal they are probabilities.

This matrix has not all the information. It's better to have a matrix about pairs of electrons. It's a N^2xN^2 = N^4 matrix (or (5N)^2x(5N)^2 or whatever) where the diagonal items D_ij^ij are the probabilities that you can find an electron in the orbital i and then you can find an electron in the orbital j.

You could assume that the matrix for two electrons is just the product of two matrices of one electron D_ij^ij = d_i^i * d_j^j, but it is not true. It's a good approximation and some methods use it and, they are very fast but less accurate.

This N^2xN^2 = N^4 matrix is actually a tensor that is just like a matrix of 4 dimensions. The operations to change the base are more difficult, but don't worry about that. Sometimes we use as a matrix and sometimes as a tensor. Anyway, the diagonal has the probabilities, and the other coefficients have the correct numbers to make the calculation (somewhat) independent of the base of orbitals you have choose.

You can define an equivalent matrix for 3, 4, 5, ... electrons, they are bigger and provide more information. We (most of the time) use just the version for 2 electrons.

Now you have to operate with a matrix of N^2xN^2 = N^4. Just a naïve matrix multiplication is O(N^6). We do more complicated thing that involves lot of multiplications (there are many ways to multiply them because they have many indices to choose). Imagine something like a diagonalization or an inverse. It's very different, but you don't want to do it by hand.

I hope this is enough to justify a O(N^6). The description has a lot of simplifications.

Notes:

I actually forget about the spin, so you have 2x5xN orbitals for Carbon and Oxygen, instead of 5xN. But the (10N)^4 matrix for 2 electron has blocks, so it's better to operate with the blocks instead of using a big matrix with a lot of zeros.

You can get more precision using more orbitals per atom, but as you can imagine, the calculations is slower.

We use a few different methods, so it's more complicated.

At each step, add "some other research group does that, and get good results in some cases".

Filligree · on Oct 19, 2019

At the bottom, they're simulating quantum physics on a classical computer. The complexity should be exponential (in time, mainly), yet they've gotten it down to "linear".

No surprise there's a large trade-off.

tehjoker · on Oct 19, 2019

From a computer science perspective, a constant factor reduction isn't that interesting but it appears that what happened here is that the bottleneck part of the simulation can now be computed independent of time step (which is huge, algorithmicly speaking).

When working on the cutting edge of computing science problems, constant factor reductions can matter a lot. It can mean the difference between being able to do an experiment or not. If an experiment costs $10,000 of computation and your budget is $5,000, finding a 2x improvement is the difference between night and day.

That said, there are many situations where you need to do much better to make the impractical practical. However, problems are often chosen because they are just on the brink of tractibility, so the constant factor improvements matter more than you'd think especially in fields where the problems are chosen for e.g. biological interestingness vs computational interestingness.

vortico · on Oct 20, 2019

In theoretical computer science, you're somewhat right. Constant reductions aren't as interesting as complexity reductions but still worth publishing papers on of course. In mathematics, it's common to get 10^9 constant speedups just be applying a clever method.

In practical computer science/engineering like HPC, people are typically excited to get 10-20% speedups, because in a project with a million dollar compute budget, that'll pay their salary that year. A 3-4 speedup is almost unheard of for already-optimized codes like LAMMPS (although these numbers are a guess at the practical maximum speed), so it's probably newsworthy.

the8472 · on Oct 20, 2019

You see this all over the place, e.g. cache-aware, SIMD-friendly approaches often win out over asymptotically better algorithms on real hardware for small to medium problem sizes.

At large problem sizes a lot of effort has already been put into using the optimal algorithm. If simulations take days you're grateful for any constant factor improvements.

Asymptotic computational complexity does matter, but it only tells you how far you can push the problem. It tells you such things as "cracking this cipher is practically infeasible even if you turned the whole universe into a computer", "problem can only be approximated", "sometimes will not get a result at all" or "so ridiculously simple that we can run this on terabytes of data within minutes". But when you have taken all those categorical things into account and do an actual implementation then constant factors do matter in the same way that serving a website at 1Mbit/s vs. 100Mbit/s matters, even though it's only a constant factor reduction in load times.

mooman219 · on Oct 19, 2019

To some degree it breaks down when writing for the hardware. For example, using an accumulation buffer for vector graphic filling is O(n²), while traditional methods are O(n). However, the accumulation buffer saves an order of magnitude of time (even at large N's!) as it takes advantage of prefetching, cachelines, and branch prediction.

djmips · on Oct 20, 2019

Do you have some references?

mooman219 · on Oct 24, 2019

Yes

https://medium.com/@raphlinus/inside-the-fastest-font-render...

unnouinceput · on Oct 20, 2019

Nooo, please don't make your research for free. Do a patent first, then you can make it free for fellow researchers while heavily tax any commercial company using this. I am sick and tired to see research published for free to use by everybody only a few months later a micro-increment of said research is done by a big pharma company who of course now will patent it and boom, none will be able to freely build upon that because "hey, we're a big company and we want our mighty $$$ in our pockets". And when finally a new application of said simulation will hit the market will cost an arm and a leg - see current insulin prices for example.

knolan · on Oct 20, 2019

How can they patent if the researcher publishes the work? Any patent search would show this to be prior art.

unnouinceput · on Oct 20, 2019

If the initial research is not patented and the big company makes a tiny research on top of it just to have a small innovation then they can patent the new research any way they want and in the process they will legally own the old research too. Just look at how the entire patent litigation is right now and is full of trolls that will patent anything they can think about and later will hit real work later.