This article covers mostly how _performant_ (as in fast/resource savvy) parallel programming is hard. Though robust parallel programming can be tricky too, and the way you "[Avoid] race conditions" will have an impact on performance. So the trade-off evocated in the article between speed, maintenance, memory could also take into account robustness.
Exactly. When I see people getting themselves tangled up with parallel programming, it's because they can't accept the performance trade-offs of simple, tractable solutions. 90% of the time, the inability to accept the trade-offs has nothing to do with actual performance measurements and everything to do with groundless assumptions that certain techniques are always way too slow.
The raw increase performance increase that you can get from parallel processing is at the absolute maximum proportionate to the number of processors you can add to your program. Usually it is much less.
A raw performance increase is not the most common good reason for concurrency, for multiple threads or processes in an application though it's a common bad reason. Achieving a decrease in latency is one common good reason to for concurrency on a single machine.