Could you explain what the DEC Alpha did differently here? It was before my time...

jcranmer · on Nov 21, 2019

It's not what DEC Alpha did, but what it didn't do. ;-)

The issue that comes up on Alpha is this code:

  thread1() {
    x = …; // Store to *p
    release_barrier(); // guarantee global happens-before
    p = &x; // ... and now store the p value.
  }
  
  thread2() {
    r1 = p; // If this reads &x from thread1,
    r2 = *r1; // this doesn't have to read the value of x!
  }

The Alpha's approach to memory was to impose absolutely no constraints on memory unless you asked it to. And each CPU had two cache banks, which means that from the hardware perspective, you can think of it as having two threads reading from memory, each performing their own coherency logic. So you can have one cache bank reading the value of p who, having processed all pending cache traffic, saw both the stores, and then you can turn around and request the other cache bank to read *p who, being behind on the traffic, hasn't seen either store yet.

Architectures with only one cache bank don't have this problem. Other architectures with cache banks feel obligated to solve the issues by adding extra hardware to make sure that the second cache bank has processed sufficient cache coherency traffic to not be behind the first one if there's a dependency chain crossing cache banks.

benibela · on Nov 21, 2019

So on every other platform than alpha thread2 works correctly without any barrier?

Does this mean when you use double-checked locking on p on non-alpha systems, you do not need any kind of synchronization on the fast path where p is initialized?

So this would be correct?

    if (!p) {
      T *x = new T;
      release_barrier();
      if (!compare_and_swap(p, 0, x))
        delete x;
    }

brandmeyer · on Nov 21, 2019

IMO, the various blogs and tutorials out there that help to make sense of the C++11 memory model make better tutorials than the Linux kernel's own shenanigans.

The C++11/C11 memory model added memory_order_consume specifically to support the Alpha.

https://preshing.com/20140709/the-purpose-of-memory_order_co...

gpderetta · on Nov 21, 2019

> The C++11/C11 memory model added memory_order_consume specifically to support the Alpha.

yes and no. Alpha is relevant because it is the only architecture where consume requires an explicit barrier, but then again, I think the revised C++11 memory model might not even be fully implementable on Alpha;

Consume primarily exist because acquire is very expensive to implement in traditional RISCs like POWER and 32 bit ARM, while consume can be implemented with a data or control dependency. Aarch64 has efficient acquire/release barriers, so it is less of an issue.

/pedantic

brandmeyer · on Nov 21, 2019

No, that's not pedantic, its a fair point.

acq/rel remains efficient on a platform that doesn't need barriers to implement it (ie, x86).

consume remains efficient on a platform that doesn't need barriers to enforce a dataflow dependency (ie, ARMv7).

p_l · on Nov 22, 2019

C++11/C11 are not implementable on Alpha CPUs that don't have BWX extensions, BTW.

This is because it requires individual bytes to be accessible in atomic way between multiple threads, which without BWX is not possible on Alpha, which started out with minimum 32bit read/writes.

gpderetta · on Nov 22, 2019

yes, but even beyond that, I think think there were changes and/or clarifications to the semantics of releaxed atomics that made the straightforward no-barriers-at-all implementation on alpha non-conformant strictly speaking. But I might be misremembering.

ridiculous_fish · on Nov 21, 2019

memory-barriers.txt from Linux is a lovely way to be introduced to memory barriers in general, and the Alpha memory model in particular.

https://github.com/torvalds/linux/blob/master/Documentation/...

rawoke083600 · on Nov 21, 2019

Man I just LOVE how the linux kernel can have such detailed and important information in a simple TEXT file. Fantastically functional and prove they focusing on the right stuff.

At my previous company. I would have to spend the better part of a day to get my "documentation" in the "correct" Confluence Style And Manner. They were adamant THAT'S were the value is, to have documentation is the most beautiful and absurd style" and double-linked structure. You would have to block out a day or two in your scrum(what nonsense) just to focus on your documentation. And this is not some sort of important* software like Linux or Banking... but a stupid website.