Take a step back. TPUs accelerate any linear algebra so they can accelerate many...

nwiswell · on June 6, 2022

> TPUs accelerate any linear algebra

Do they actually? Or do they just accelerate matrix multiplications of a very specific size?

I am not really a domain expert for scientific computing, but for example in finite element analysis, the tetrahedral stiffness matrix is 12x12. Presumably the matrices in fluid flow simulations, climate modeling, etc are also modestly sized, and the challenge has more to do with just how many total multiplications there are.

It is not at all clear to me that an accelerated 128x128 multiplication is helpful in these contexts.

dekhn · on June 7, 2022

What matters for TPUs today is whether the problem is sparse. TPUs are good at dense matrix multiplication (btw, the 128x128 is just the unit size, it still works with larger systems through the obvious composition of multiplications) with a modest amount of ALU and vector work and lots and lots of "tensor" (multidimensional array). And they are almost entirely 32-bit floats, not 64-bit.

So far TPUs have not really proved their worth for general purpose simulations across a wide range of fields. They weren't built for that, and we have other systems that can run the workloads faster, although it's unclear what wouuld happen if you took a really great team and had them work on a hard simulation problem (for example- my previous area was molecular dynamics, and TPUs can do great work on n-body simulations, but not so great on other parts of the force field.

If you have sparse mixed problems then CPUs are still the most cost effective. If you have dense matrix problems GPUs are the most cost effective. If you have problems that don't fit on other systems and you have a good team to optimize to the hardware, TPUs are an option and could be cost effective in principle.

gpm · on June 6, 2022

Presumably finite element analysis involves multiplying many such matricies? Hopefully in parallel?

If so, you can represent them as a Nx12x12 "tensor" for some large N (presumably proportional to the number of elements?), and I'm reasonably sure that's within the realm of what TPUs accelerate well.

wiredfool · on June 7, 2022

FEA is essentially solving the equation F=-kx (- cv + ma if you're going dynamic).

There are a few steps.

1) discretize the problem, breaking the item into a mesh. Maybe triangles/ tetrahedrons, but maybe also cubes, as the element math is easier.

2) for each element apply a geometric transform to the element stiffness matrix to convert it from a unit stiffness matrix to the global coordinates. Note that this step accounts for the shape of the unit as well.

3) assemble the stiffness matrix by iterating over all the degrees of freedom of all the nodes and adding the contribution from each element stiffness matrix. This will result in a (nodes * dof) Square symmetric matrix, that’s generally sparse and tends to the diagonal. If you’re doing dynamic, the damping and mass matrices need to be assembled as well.

4) Solve the equation using some factorization method, either LU or similar for static, or some Eigen solution for dynamic.

If you’re doing nonlinear/ plastic then repeat generating the stiffness matrix at each iteration.

It’s been years since I did this, but at the time (‘97) I went to a lecture that asserted to that point, since the first cray, the hw improvements and the sw improvements were both about a 10^6 speedup for FEA problems.

whimsicalism · on June 6, 2022

These are usually iterative methods