This is all good, but I want to point out that threads are not the unsafe bit, the unsafe bit is shared mutable state. If you can use a model where that is not a problem (data flow arch, STM), threads cease to be scary.
Someone has to implement the data flow engine or the STM. The article is about the low-level details (read hardware memory model and basic is synchronisation primitives) required to do so.
3) If you really, really do need threads, keep it as simple as possible and make sure that every access that needs to be synchronized, is synchronized.
I've been programming for a long time now and I've seen precious few cases where multithreading is legitimately a better option, let alone the only option. Usually even then it's just about using more cores.
Even then threads are rarely the correct architectural abstraction for your code. Something like a work stealing job scheduler that happens to have spun up to the same number of worker threads as cores on the system under the hood generally works way better. That is, there should ideally be only one line in your code base that says "new Thread()" or the like.
There's "using threads at all" and there's "using threads as component of your exported architecture". The vast majority of your code should work if there's one worker thread or a hundred. You shouldn't be using threads to save state while some long running IO operation blocks you for instance. Even if you have computationally expensive work that screams threads, you should split them into small jobs backed by a priority queue anyway because that's how you most effectively distribute work regardless of one core or many.
A thread pool can have two threads pick up the task and run it concurrently. What you're talking about is basically an event loop that runs one task at a time (on any core ) giving you sequential execution.
Lol 3) is such generic advice. It's like "keep your doors locked ". A bunch of things may need atomic updates. Message passing is probably the easier approach.
Yes, but in many many cases where people think they need to, there's actually a simpler and safer way to do it. Too many programmers see a long running task or an unresponsive UI and immediately declare that threads are the answer, when they're almost always not.
Admittedly they're much more likely to be useful now with so many spare cores on every machine, but they're still a hammer when most problems are screws.
Long running computationally expensive work? Break it into small jobs and have your job queue be a priority queue.
At that point you're correct regardless of if you've got one thread or many servicing the queue. And generally the best way to take advantage of multiple cores os to have it be totally agnostic to the number of cores it's running on like this anyway.
IHMO, threads are mostly useful for scaling a workload beyond the capacity of a single core. For all other types of concurrency problems, a single thread of execution using some kind of event loop is usually a better fit.
It is amazing how often I've seen developers reach for threading when it really wouldn't solve their particular problem. The downsides regularly outweigh the benefits.
1. SIMD "threads" being programmed with traditional programming languages (ie: CUDA C++). These are "false threads", they run in groups of 32 (NVidia) or 64 (AMD) and have unique characteristics compared to traditional threads.
2. The push for "Task-based" parallelism. Instead of writing threads, you should write tasks. Tasks are then run on a thread-pool. The difference being: Tasks don't always spin up a new thread. You track "dependencies" between tasks to minimize communication and synchronization instead. Tasks are both more efficient than threads AND easier to reason about. The crazy example is Intel's "Inefficient Fibonacci": https://software.intel.com/en-us/node/506102 . The task-based Fibonacci is both efficient and simple. Threads simply cannot compete.
-------------------
It turns out that threads themselves are both inefficient and complex. Pure performance pushes us towards SIMD / GPGPU compute. Even on CPUs, the fastest computational model is AVX or AVX512.
Simplicity pushes us towards the Tasking model, which happens to be more efficient in practice anyway. Its far cheaper for thread-pools to swap tasks, rather than spinning up threads for the scheduler to pass around and manage.
Classical thread based programming seems like a dead end to me. Its too complex and doesn't give enough returns on modern architectures. The Task model is built on top of threads, but I expect most programmers to switch to the Tasking model, and only to leave "Threads" to the OS-devs or System-level devs far below.
The SIMD model gets everything done efficiently with barriers in most cases (no need for semaphores or mutexes... indeed... mutexes don't work on traditional SIMD GPUs like Pascal or AMD Vega... due to how SIMD Wavefronts execute). Other issues are handled with "CUDA Shared Memory" or "OpenCL Local Memory" and atomics. Its weird to lose the Mutex or Semaphore as a tool, but you learn that you really didn't need it once you learn to use Barriers and Atomics effectively.
SIMD Model gets a LOT done with atomics, popcount, ballot, and various primitives that are alien to CPU Programmers. Watch a "Merge Path based Parallel Merge Sort" on a GPU if you wanna see what I mean. Its very alien, but it works extremely efficiently.
Tasks similarly have a different model compared to Threads. They're the natural successor. Almost everything I can think of can be more elegantly expressed as a Task instead of as a Thread. That'd be OpenMP 4.0, Intel TBB, or Microsoft PPL.