Great write-up. As I'm perusing the math/somefeature_arm.s and math/somefeature_...

cdoxsey · on Feb 7, 2013

In this case there is a 386 implementation:

http://golang.org/src/pkg/math/remainder_386.s

The other two call the go version:

http://golang.org/src/pkg/math/remainder.go

dualogy · on Feb 7, 2013

Yeah that was kinda my point ;)) so basically 386 users get some highly specialized hand-tuned asm optimization whereas arm and amd64 merely get a commoner's gc treatment -- or am I missing something? ;)

stephencanon · on Feb 7, 2013

Several things:

- The fprem1 instruction is actually a long microcode sequence and is quite slow; 26-50 cycles on SNB according to Agner Fog. Several iterations of that loop are necessary for a complete reduction of some operands.

- There is no analogous instruction on arm (or really, any platform that isn't x86), anyway.

- If you're using the floating-point remainder operation in a performance-sensitive context, You're Doing It Wrong. Programmers have gotten so used to this that there is little value in optimizing remainder; it is rarely used in situations where the optimization would matter.

dualogy · on Feb 8, 2013

This was just a random example I picked, I'm not using that function currently. But let's just say abs(), sin(), cos()...

cdoxsey · on Feb 7, 2013

Yep. I would guess the compiler produced sufficiently optimized code for the other platforms.

Very little of the Go standard library is written in anything but Go.