Great write-up. As I'm perusing the math/somefeature_arm.s and math/somefeature_amd64.s files in Go's math package, I can't help notice most of them look like this:
That is, very short code that just looks like a function call to me, the complete asm noob. What is it, are all these maths functions implemented in hardware by amd64 and arm architectures? Or is this "asm code" just reverting to calling the Go implementation of the particular function (aka the not really that "optimized")?
Yeah that was kinda my point ;)) so basically 386 users get some highly specialized hand-tuned asm optimization whereas arm and amd64 merely get a commoner's gc treatment -- or am I missing something? ;)
- The fprem1 instruction is actually a long microcode sequence and is quite slow; 26-50 cycles on SNB according to Agner Fog. Several iterations of that loop are necessary for a complete reduction of some operands.
- There is no analogous instruction on arm (or really, any platform that isn't x86), anyway.
- If you're using the floating-point remainder operation in a performance-sensitive context, You're Doing It Wrong. Programmers have gotten so used to this that there is little value in optimizing remainder; it is rarely used in situations where the optimization would matter.
http://golang.org/src/pkg/math/remainder_arm.s
That is, very short code that just looks like a function call to me, the complete asm noob. What is it, are all these maths functions implemented in hardware by amd64 and arm architectures? Or is this "asm code" just reverting to calling the Go implementation of the particular function (aka the not really that "optimized")?