I'm no optimisation expert, but I'm wondering if the FMAs are slow because the r... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

kristianp on Aug 29, 2017 | parent | context | favorite | on: An adventure in trying to optimize math.Atan2 with...

I'm no optimisation expert, but I'm wondering if the FMAs are slow because the result of each one is dependent on the previous one? The dependency on the result may mean that the processor can't pipeline the operations. Could it be faster if the two chains of FMAs on either side of the division are interleaved and use different registers?

z := x * x

z = z * fma(fma(fma(fma(P0, z, P1), z, P2), z, P3), z, P4) / fma(fma(fma(fma((z+Q0), z, Q1), z, Q2), z, Q3), z, Q4)

z = fma(x,z,x)

This article is quite the nerd-snipe.

DannyBee on Aug 31, 2017 [–]

It will definitely be faster on the newest intel processors, which have dedicated FMA units. In fact, to max out floating point on the things, you'd likely have to intermix FMA into the normal FP stream (IE FMA by 1 or FMA of something plus 0)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact