Hacker News new | past | comments | ask | show | jobs | submit login

I actually don't think that's true. My understanding is that on x86, atomic instructions have implicit lock instructions before them. (Or you can make some instructions atomic by putting a lock instruction before them.) Such instructions lock the bus and prevent other cores or SMT threads from accessing memory. In that way, you can safely perform an atomic operation on a value in the cache.

Note that this implies that atomic operations slow down others cores and SMT threads.




Locks are often implemented using an xchg instruction, which is implicitely locked.

All processor's caches are committed/flushed for the affected cache line. So its correct to say other processors are slowed down. But it also in that sense IS a main memory operation, just not yours.


To be clear, then we agree that haberman was correct, and the value can be changed in cache.


Sure. It just isn't useful in cache. If it is a real lock, it has to be shared. Either the caches have to reconcile, or it has to go to main memory.


Just because a lock is shared does not mean that it's contended.

For example, some multi-threading techniques attempt to access only CPU-local data but use locks purely to guard against the case where a process is moved across CPUs in the middle of an operation (thus defeating the best-effort CPU-locality).


But it has to be available to those multiple cpus, right? So it has to go down to memory and back up to cache. Contended or not.


Maybe I'm missing something, but if the cache line is only in use by one CPU, I don't see why the value would need to be immediately propagated to main memory or to any other CPU's cache until it is written as part of the normal cache write-back.


Correct. Typically the cache snoops the main memory bus. If a remote CPU starts a read on a cached memory location, the caching CPU sends a "stall" or "retry" signal to the reader, does a cache flush to main memory, and then lets the remote CPU proceed with the (now correct) main memory read.


It's useful for fields (not just locks; I'm thinking of lock-free algorithms that do a compare-and-swap directly on a value) that have low contention.


Even uncontended, the value has to be reconciled between cpus/caches. So it has to be plumbed down thru all the caches, then back up.


That's my point: uncontended values may not be in any other caches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: