> So now we get to thinking about whether this gives the developer an advantage over just knowing "Dirtying cache lines across different cores/threads is slow". I don't think I would conclude so here.
The hidden thing behind all this is that even if the data is just read-shared, it can still generate traffic between cores and sockets.
Since these communication links are a shared resource [0], doing things wrong hurts performance in unrelated code and cores. Just because of storm of cache coherency packets is being sent between cores.
So yeah, you really do want to minimize this to maximize performance and scalability across the whole system!
[0]: In Intel's case, this shared resource is ring bus inside CPU socket and QPI between CPU sockets.
True, high validation traffic of read-shared data can indeed have effects in some cases, especially on multi socket systems, and it can sometimes be beneficial to have private copies of even read-only data for different threads if the data is small.
The hidden thing behind all this is that even if the data is just read-shared, it can still generate traffic between cores and sockets.
Since these communication links are a shared resource [0], doing things wrong hurts performance in unrelated code and cores. Just because of storm of cache coherency packets is being sent between cores.
So yeah, you really do want to minimize this to maximize performance and scalability across the whole system!
[0]: In Intel's case, this shared resource is ring bus inside CPU socket and QPI between CPU sockets.