This is an excellent description of a false sharing perfoance issue, and I wouldn't envy the work involved in tracking that code down through the JVM. These sorts of issues are common inside the OS kernel itself. Kernels are managing lots of shared resources, and as you add more CPUs, these sorts of issues emerge as bottlenecks. They're enough of a problem that the perf tool recently got a new sub-command, "perf c2c", which tries to help identify false sharing (and other cache ping-ponging) issues much quicker.
If you're interested in learning more about this class of issues and the new tool, there was an excellent talk at LPC 2022 by Arnaldo Carvalho de Melo. And as luck would have it, the videos just came out, so I can link it here. In my opinion it was one of the best talks of the conference.
If you're interested in learning more about this class of issues and the new tool, there was an excellent talk at LPC 2022 by Arnaldo Carvalho de Melo. And as luck would have it, the videos just came out, so I can link it here. In my opinion it was one of the best talks of the conference.
https://youtu.be/A23UopXKq6E