Exactly, which is why I call it bidirectional partitioning: one forward, one backward. It's a very strange case where you can use parallelism (in this case instruction-level parallelism), but only get two independent instances without the ability to recurse further.
You can of course make partitioning embarrassingly parallel, look at IPS4o for that. But it is vastly more complicated, and involves overhead shuffling blocks after the partition.
You can of course make partitioning embarrassingly parallel, look at IPS4o for that. But it is vastly more complicated, and involves overhead shuffling blocks after the partition.