Speaking of gcflags, did you try Go with gcflags=-B to see how it performs without bounds checking.
I know this isn't 'the right way' given that Go is supposed to be a safe language but it would be interesting to see how much difference it would make.