Acquire-Release seems to be an outstanding success. ARMv8 added new assembly lan...

namibj · on July 13, 2021

I wish there was actual software support for Acq/Release-like semantics, but somewhat more relaxed by way of e.g. specifying two stores (data and pointer-to-date) to require in-order visibility, without enforcing a strong ordering of this store pair relative to other (semantically unrelated) stores.

Barrier-based abstractions could handle that, if they support more than one barrier. For loads, this would allow efficient dependent load reordering, by essentially enforcing the ordering only where needed for concurrency reasons (this mostly helps speculating loads before the address is confirmed, and not needing to snoop for invalidations of the cache line that contains the speculated address/killing the load), and similarly taking pressure of the store buffer by being less strict about the order in which it commits to L1D$.

RISC-V's propose WMM has such weak default ordering, but due to using fences, it's overly strict to the point where it performs worse on heavy concurrent code that's littered with atomics, compared to a TSO version (of the same softcore) that "just" prefetches exclusive access for writes. Even when turning RMW into relaxed semantics, so it's just due to the overly-strict load fence that effectively trashes all shared-state L1D cachelines.

nyanpasu64 · on July 12, 2021

> Fully relaxed is... not a model at all and does the job spectacularly! Some people don't want any ordering what so ever, lol.

Many people wish C++ memory_order_relaxed had no effect on synchronization and could be optimized like a normal access. It's not. https://internals.rust-lang.org/t/unordered-as-a-solution-to...