It follows in a weird way: Processes requires more effort to communicate so you are less likely to send things across processes and when you do it is because it is worth it. You are also likely to batch the communication and then go off and compute.
All of the above improve performance. Nothing about it requires processes though, you can get the same results with threads if you take just as much care.
However processes do add one more thing: you can scale to multiple physical computers in ways threads can't.
> However processes do add one more thing: you can scale to multiple physical computers in ways threads can't.
Although this is technically true, communication with another system has orders of magnitude higher latency and lower bandwidth compared to other cores in the same CPU. So in practice, you'll typically have to redesign your approach anyway.
If you put ring buffers in hugepages, the number of TLB entries involved is small, typically 1. And, pinning the processes to cores, each gets its own set.
All of the above improve performance. Nothing about it requires processes though, you can get the same results with threads if you take just as much care.
However processes do add one more thing: you can scale to multiple physical computers in ways threads can't.