Multithreaded data structures for parallel computing

jey · on April 25, 2012

The code in listings 3 and 5 is not thread safe. The list could easily become empty between the call to empty() and acquiring the mutex, leading to a race condition. I see two other subtle race conditions as well. I haven't looked at the rest but this density of bugs is enough to convince me that this author doesn't understand multithreaded programming well enough to give advice on the topic.

Other bugs:

1. Assumes that std::list<T>::empty() is an atomic operation.

2. Assumes it's safe to use separate locks for reading and writing, which std::list says nothing about. It probably screws up even in practice for at least the case where you have only one element in the list and two threads execute push() and pop() at the same time.

mkup · on April 25, 2012

Agreed. It's better idea to use mutex + 2 semaphores to implement queue with blocking: http://en.wikipedia.org/wiki/Producer-consumer_problem#Using...

(it's a classical computer science problem with classical solution)

dgottlieb · on April 25, 2012

In fairness to the rest of the article, in Listing 9 the author does protect the call to empty() with (the same) lock and also moves the condition inside of a while loop.

mentat · on April 25, 2012

I agree. It looks like their read implementation means that every thread pops the same item off as well, since they clone to a temp variable.

jey · on April 25, 2012

Threads have separate stacks and the call to front() and pop_front() is protected by the mutex, so that part is fine as far as I can see.

mentat · on April 25, 2012

I'm not familiar with the STL implementation, but it seems the pop would need to be coherent with the push in an empty stack. Like if a thread was trying to pop the only entry while another was trying to push, 'intuitively' there'd be a race condition there if STL isn't thread safe (which is the whole point of the article I think).

Jabbles · on April 25, 2012

I agree. Listing 11 won't even compile as was_empty is declared const.

jey · on April 25, 2012

That use of const is fine, as it's never modified after the initial value is assigned.

erydo · on April 25, 2012

No, Jabbles is correct: look at listing 11. It's reassigned five lines down.

jey · on April 26, 2012

Oops, I was looking at listing 8. Thanks.

pjscott · on April 25, 2012

And for priority queues, there's a wonderfully clever scheme which involves using a skiplist:

http://www-cs-students.stanford.edu/~itayl/ipdps.pdf

The probabilistic nature of the data structure lets them get those nice O(lg n) and O(1) expected times without the wide-ranging memory conflicts that slow down concurrent min-heaps. Some more basic info on skiplists for people who aren't familiar with them:

http://en.wikipedia.org/wiki/Skip_list

marshray · on April 26, 2012

Yes I'm working on an implementation of those myself.

There are even designs for lock-free skiplist priorty queues, but they tend to be a little slower than the locking versions.

pjscott · on April 26, 2012

Lock-free skiplist priority queues, incidentally, become very elegant if your processor supports even the most basic hardware transactional memory.

hamidpalo · on April 25, 2012

There are much better ways to implement a concurrent queue than the ones shown. Using locks for every single queue/dequeue is a very good way to completely kill performance. A much better alternative is to use hand-over-hand locking, or use CAS for a lock-free implementation.

TAOMP ( http://www.amazon.com/The-Multiprocessor-Programming-Maurice... ) goes over this in great detail and is overall an excellent book.

atechie · on April 25, 2012

Following is a great resource for Concurrent programming without locks and software transaction memory : http://www.cl.cam.ac.uk/research/srg/netos/lock-free/

strlen · on April 25, 2012

There are rather basic (producer/consume queues) and as some have pointed out, are buggy. I'd highly suggest looking at Boost's threading library (for an example of an object oriented approach to threading, taking advantage of RAII -- much of it is now standard in C++11), Intel's Thread Building Blocks, Java's built in Concurrency Utilities (java.util.concurrent package), Doug Lea's Fork Join Framework. The a great book on the subject is Maurice Herlihy's The Art of Multiprocessor Programming:

http://www.amazon.com/The-Multiprocessor-Programming-Maurice...

The book is in Java, but C++11 has the required primitives (cross platform compare and set for integral types, a defined memory model) so you could follow allong in C++11.

adunsmoor · on April 25, 2012

Intel's Threaded Building Blocks (http://threadingbuildingblocks.org) has a very nice collection of threaded data structures and algorithms.

I've found the performance to be quite good and the interfaces are compatible with STL in most cases which can be useful.

ZephyrP · on April 25, 2012

As Jey has so eruditely pointed out, this advice borders on useless. I've always implemented LL queues as a segment of a list, where each element is protected by it's own mutex wherein each thread may add or remove a node by taking a lock on the first node, then moving to the second node, letting go of the first lock and so on. Although you'll end up holding more than one lock while traversing and three while deleting, this is at the core of powerful ideas like hand-over-hand locking.

krosaen · on April 25, 2012

Another approach: use a persistent data structure ala clojure's:

http://blog.higher-order.net/2009/02/01/understanding-clojur...

darkstalker · on April 26, 2012

If you're using C++ (like the article) you should be using boost::thread instead of low level pthreads, or std::thread if you're using C++11