It probably has something to do with NUMA. Java 7 has a NUMA supported allocator that makes it more likely for threads to access memory from their local node. I'd be interested to see the golang test running with multiple processes, with each one running on an individual NUMA node.