The idea that made this click for me 10 years or so ago was that distributed sys...

Retric · on June 12, 2012

That's not exactly a shared memory model. A better example would be a static look-up table for fixed game state information and a bunch of AI worker threads.

With the correct protocol you can even write to that shared memory. For example a sudoku solver where each tread looks for contradictions and then overwrites the shared states with their findings. This works because there is no contradictions, dirty reads are ok, and there is no contention with all threads writing 0's and nobody writing 1's. You can go even further where any thread can flip the error flag if a contradiction shows up or solution flag if a solution is found, but again it works because nothing clears those flags.

PS: The article's author uses a less strict defintion of lock free where it's ok to use locks for individual atomic operations like counters. But, that still creates performance issues at scale.

preshing · on June 12, 2012

Actually, I thought it was pretty clear in the article that it's not ok to use locks (mutexes) under the definition given. I tried pretty hard to use the most widely accepted definition of lock-free, even exchanging e-mails with Maurice Herlihy about it.

Retric · on June 12, 2012

In this sense, the lock in lock-free does not refer directly to mutexes, but rather to the possibility of “locking up” in some way, whether it’s deadlock, livelock — or even due to hypothetical thread scheduling decisions made by your worst enemy.

Suppose you had an operation that was guaranteed to either add your input to some location in memory or fail gracefully in some fixed amount of time. It's first come first served and implemented at a hardware level. Now this fits with the definition of lock free in that a process can't prevent your process from finishing, and works just fine if you want say a counter of number of moves examined by your AI thread as a thread can always keep an internal counter and try again say after the next batch goes though. However, you still get into resource starvation issues depending on how it's used and with enough threads you encounter a wide range of issues.

Lock free programming is not a binary condition so much as hierarchy of conditions. Where various types of resource starvation (ex: at the cache or buss level) are perhaps the most stringent. Scale up to say a billion cores and 100 billion threads and many 'safe' operations create significant issues.

PS: Wikipedia uses Wait-freedom > Lock-freedom > Obstruction-freedom all under http://en.wikipedia.org/wiki/Lock_free

preshing · on June 12, 2012

Based on your comment, I should probably be more precise and say: "the possibility of 'locking up' the entire application in some way". I'll update the post. There's nothing about this definition that prohibits a single thread from starving.

scott_s · on June 12, 2012

I think it's easier to phrase it in the positive sense: at least one thread is always making progress towards completion.

scott_s · on June 12, 2012

The article's author uses a less strict defintion of lock free where it's ok to use locks for individual atomic operations like counters.

The author's definition does not allow one to use locks to make individual operations atomic. A process could always crash after obtaining the lock, but before releasing it, causing the entire application to become stuck.

strlen · on June 12, 2012

Lock free algorithms still use primitives like memory barriers. A memory barrier in a shared memory system is expensive, but not nearly as difficult as a memory barrier in a distributed system.

Additionally, distributed systems not only experience failures of individual components, but they're often in a state where you can't reliably detect failures (e.g., tell a failed node apart from a node that's slow or a node that you can't temporarily talk to while other nodes can).

It's also possible to think of it as a continuum: for example, failure detection is simpler in a distributed system connected via Infiniband than in a system connected via commodity ethernet; locality matters in NUMA CPUs and so on. The idea that a commodity multi-core machine has some properties of distributed system is not without merit, but it isn't particularly useful or actionable as a working model.