Bridging the Gap Between Programming Languages and Hardware Weak Memory Models

nielsbot · on July 25, 2018

Could someone provide a non-academic summary of what this means? I am a programmer but not a computer scientist...

codebje · on July 25, 2018

The simplest way to think about memory is as a place where one executed instruction might write a value into a slot, and the next be able to read it.

In practice, things are more complicated. The next instruction might be from a different process, executing on a different (hyper)core. Or the next instruction executed might not be the next instruction in the program, because the CPU may be performing out-of-order execution. The CPU will have several layers of cache as well, and cache coherency protocols aren't always engaged.

In a language like C, you can declare a variable to be volatile. When the program is compiled, this causes the compiler to emit memory barrier instructions - these barriers force the CPU to do the work to ensure some semblance of sequential memory access is restored across the two sides of the barrier. These may be implicit, in the memory semantics of the language itself, or explicit operations in the language.

However, it's not easy to be sure that the barriers that end up in compiled and executed code actually provide the memory semantics that the language specified. This is exacerbated by the different semantics of different hardware architectures, and the overall complexity of the problem space.

The paper provides a way to address this problem by introducing an intermediate memory model, which is amenable to formal verification of compiler outcomes. They map this model to a handful of popular architectures, showing that the mappings are correct (more or less, that for all possible execution flows, the observable behaviour is the same). They also map from RC++ (repaired C++11, fixing an unsound semantics in concurrent memory access) and Promise ("the promising model") to the IMM.

Assuming the proofs are correct (a non-trivial assumption in this field, unfortunately) this offers a step forward in the process of formally verifying that the output of compilers actually behaves in the way that the program should, according to the programming language's semantics. There aren't many formally verified compilers around, and those that do exist are for subsets of languages we might use, limiting their applicability.

cesarb · on July 25, 2018

> In a language like C, you can declare a variable to be volatile. When the program is compiled, this causes the compiler to emit memory barrier instructions - these barriers force the CPU to do the work to ensure some semblance of sequential memory access is restored across the two sides of the barrier.

That is the case in Java, but not in C (which is a source of confusion to those who learned Java first). In C, volatile only guarantees that the compiler will not use a cached value when reading from the variable, and will not omit an apparently redundant write when writing to the variable. It's mostly useless except when dealing with memory-mapped hardware devices (where reads and writes have side effects), with Unix signals on single-threaded programs (for instance, having SIGTERM set a variable read by the main loop - without volatile, the compiler might read only once), or with setjmp/longjmp.

toadworrier · on July 25, 2018

Yes, it's useful for the problems that were around when it was invented and useless for the new problems that have arisen since.

I want my old problems back.

vardump · on July 25, 2018

> In a language like C, you can declare a variable to be volatile. When the program is compiled, this causes the compiler to emit memory barrier instructions

Not true (except maybe on MSVC). This unfortunately very common misbelief has lead to so many concurrency bugs on weakly ordered CPU architectures.

dragontamer · on July 25, 2018

> In a language like C, you can declare a variable to be volatile. When the program is compiled, this causes the compiler to emit memory barrier instructions

I'm going to guess you're a Java programmer. What you say is true in Java but not in C.

In C, volatile doesn't necessarily generate memory barriers. Volatile just means that the memory can't be stored in a register, the CPU will re-read the memory address each time. To generate memory barriers, you need to use the C11 atomics. A combination of "atomic_store" and "atomic_thread_fence" is needed. The C11 model has fences relative to a specific atomic operation.

See the newer header file #include<atomics.h> for more details.

codebje · on July 26, 2018

Java is "a language like C" if you squint? ;-)

Thanks to you and others for the clarification on C's (lack of) behaviour!

rwmj · on July 25, 2018

The C volatile keyword doesn't have well-defined behaviour, but I'm pretty sure there is no compiler where volatile causes extra memory fence instructions to be emitted. It's just a flag hinting not to keep the value in a register.

cwzwarich · on July 25, 2018

I believe Microsoft compilers have traditionally treated volatile as having ordering properties (giving reads acquire semantics and writes release semantics). On x86, this just results in compiler-only barriers, but on Itanium this would result in actual barrier instructions.

This is far from a standard behavior, though.

gpderetta · on July 25, 2018

I think this behaviour has been considered a mistake and is disabled by default in recent msvc versions

chombier · on July 25, 2018

Thank you so much for this nice summary.

dragontamer · on July 25, 2018

Modern CPUs have a degree of memory inconsistency. If on Core#1: you do "A = 10, B = 12, A = 15", Core #2 might only see "B=12, A=15".

However, the above is still a relatively strong case (like x86). On Power or ARM, you might see the ordering "A=15 THEN B=12". (The so called: Store-store reordering).

This happens because you write to cache first, and THEN the caches synchronize at a later time. So the caches can independently change their values. Every core has its own cache and therefore its own view of memory.

If you wanted the variables to be "assigned in the correct order", you have to do so called "barrier" or "memory-fence" instructions. The idea is, if the CPU's interconnect ever decides that its more efficient to order memory a certain way, then it will do it in the more efficient manner, rather than the "correct" order.

In most cases, such as single-threaded code, it doesn't make a difference at all. Only in multi-threaded code where mutexes and semaphores exist do problems occur.

A memory model is a programmer-invented abstraction to be able to reason about these cases. Everyone wants to use the faster and more-efficient CPUs, but its really, really, really complicated to think about right now. The paper presents a new memory model that they claim is easier to think about.

saagarjha · on July 25, 2018

This appears to be a way to formally verify the correctness of concurrent programs when translated from a modified version of C (RC11–"Repaired C11") to native machine code. At least that's what I got from the paper–I didn't read it all the way though, I just skimmed it.