You wrote that Unity FFI is much faster than cgo. I'd be interested in knowing more about this. Why is it faster? Can you share a link to some reference material and/or benchmark?
my understanding is go is slow because each function call has to copy over to a bigger stack (goroutines have tiny stacks to start, and grow on demand, but c code can't do that, natch) and because it has to tell the goroutine scheduler some stuff for hazy reasons.
these benchmarks https://github.com/dyu/ffi-overhead seem to show that a c call via cgo will be about 14 (!) times slower than a c call from c# in mono, which itself is about 2 times slower than just a plain c call.
cgo doesn't copy the stack to a "bigger stack" when calling a C function. It just uses another stack, dedicated to cgo, and disallows passing pointers into the stack:
i wonder if it's possible to speed up ffi in go...