Hacker News new | past | comments | ask | show | jobs | submit login
Concurrent Programming, with Examples (begriffs.com)
330 points by begriffs on March 24, 2020 | hide | past | favorite | 88 comments



Next article in the series: now that you know about the dangerous/complicated primitives, don't ever touch them again. Instead use the high-level safe concurrency/parallelism mechanisms in your programming language: futures/promises, nurseries, channels, observers, actors, monitors. Ideally, these should be built-in, but a library whose API composes well into most programs will also do.

Data races can be statically removed by carefully restricting certain parts of the language design, see Pony. https://tutorial.ponylang.io/#what-s-pony-anyway

Bonus: learn aspects of deadlocking by playing a game: https://deadlockempire.github.io/


I disagree, first you need to program a few toy projects with the dangerous primitives, to really internalize how tricky they are.

It's because of my concurrent C homework assignments that I was able to really appreciate Go's channels in my first internship.


“Never touch them again” means don’t reach for those primitives when higher level tools are available. Learning about them is fine.

It’s like with cryptography - don’t roll your own solutions just because you know what xor is. You’ll fail miserably.


I am just getting into concurrency. Is there any point to use threads or forkjoinpool in Java anymore? Should you always just use CompletableFutures and Suppliers?


Somebody has to maintain the operating systems and a metric ton of system software written in C. It cannot be magically all switched to Pony or Rust overnight.


See also Rust for the data races portion.

Though IIUC Pony is also supposed to prevent deadlocks.


There are no locks in Pony, so yes: no deadlocks. (livelocks / other kinds of "lack of progress" are of course still possible tho)


The article is very nice, thanks a lot for it. Especially since I hear the word concurrency and parallelism often thrown around without any distinction.

Very off topic, but I have read several times the argument that the rise of functional programming is due to it's easy concurrency (since functions don't have side effects) and that concurrency becomes more and more important due to moores law being dead (i.e. we can't scale the hardware up, we have to add cores to our processors).

Could someone with more experience comment on that? Is concurrency really easier in functional languages and is the rising importance of concurrency a valid reason to look into functional programming?


> Is concurrency really easier in functional languages and is the rising importance of concurrency a valid reason to look into functional programming?

I think it depends on who's writing the code. A little bit of shared mutable state here and there isn't gonna kill you, but if the program's architecture is poor, that'll blow up and spread everywhere and the next thing you know is you're spending half your time worrying about data races, deadlocks, and a locking nightmare.

Shared-nothing architectures fix that; your synchronization will happen over a narrow range of well defined primitives (e.g. message queues), though this doesn't prevent race conditions.

Functional programming that encourages pure functions & immutable state is a bit of a straight jacket that keeps you from making a mess that blows up. I think it helps, and I wish $stuffAtWork were written in such a language because I'm tired of wrestling with terrible architecture (that I'm not really allowed to fix, given the time constraints).

The other thing that makes a difference is what your problem looks like. Sometimes concurrency is absolutely trivial, like dispatching sections of an array for threads to perform (independent) compute on, and come back with a result. Sometimes it's way more complicated...


Yes. Concurrency is easier in functional languages.

Many of them provide a concurrency monad which allows you to write expressions like this:

   read_from_socket_into_buffer params >>= 
   process_buffer >>= 
   ...
Where you roughly read the '>>=' as follows: "the left side might take an arbitrary amount of time to produce a result. While it's off and doing its thing, go and do something else. Once it has produced the result, take it and feed it as a parameter to function on the right."

BTW, It's quite easy to implement your own concurrency monad, however: the real work is the wrapping of all the blocking system calls into this framework.


This sounds a bit like async in some other languages.


Lots of functional language featured end up copied into traditional languages (monads, immutability, pattern matching, lambdas, etc). The difference is that these are almost always more ergonomic in a functional language that was designed with these features in mind vs. having them bolted on afterwards


Concurrency is all about simultaneous effects. A language like Haskell is nice enough because it makes effects explicit, so you at least know where things can go wrong, but if you have a bunch of concurrent effectful Haskell threads running at the same time, you have the usual issues with nondeterminism. Functional languages to tend to emphasize higher-level concurrency patterns than old-school shared mutable state, so in practice it works OK.

Specifically, concurrent Haskell is good and also quite fast. It doesn't quite compare to Erlang in the concurrency department although the languages are very different in other ways, so there are reasons to prefer Haskell anyway. I think this shows that just having a functional language that emphasizes effect-free programming is not sufficient. If you want a good concurrent language, then you have to design it like that, and not simply rely on the absence of shared mutable state.

In practice, people write massively concurrent systems all the time, in the form of dynamic web sites that handle many concurrent requests, and synchronise access to a shared mutable state via a transactional database of some kind. These don't depend on fancy languages, but on a rigid architecture that tries to isolate synchronisation issues in the web server and database components.

While this article is about concurrency, I'll also add a note on parallelism: Parallel functional programming has IMO not been shown to work well in practice. While it is trivial to parallelise more or less any pure functional program correctly, it is very hard to actually gain meaningful speedup from it. The single point of parallelism is performance, and most current implementations of functional languages (including all general purpose functional languages I know of) simply have too many bottlenecks or pitfalls to scale for computational work as well in practice as one might expect in theory. For example, concurrent Haskell is fast because most of the actual work tends to be IO and GHC has an excellent parallel IO manager and scheduler, while parallel Haskell is hampered by, among other things, a slow garbage collector.


Do you refer to Erlang excelling in distributed concurrency, due to how its built to rely on the actor model? Or Erlang also excels in concurrency on a single machine?

Btw, what are some of those other things hampering Haskell's performance with parallel computing? Wouldn't the garbage collector mainly produce pauses instead of really destroying parallel performance? Although a concurrent garbage collector has recently been merged to the nightly GHC.


> Do you refer to Erlang excelling in distributed concurrency, due to how its built to rely on the actor model? Or Erlang also excels in concurrency on a single machine?

Erlang has great concurrency also on a single machine. That is scales well to clusters is of course also good. And yes, it is the actor model that makes it work so well. That the individual processes are written in a (mostly) functional language is less important, I think. It could have been an imperative language and it wouldn't have made much difference, as long as process state was still isolated from other processes.

> Btw, what are some of those other things hampering Haskell's performance with parallel computing? Wouldn't the garbage collector mainly produce pauses instead of really destroying parallel performance?

Haskell's garbage collector is pretty slow when run on multiple cores, and when run on a single core it becomes a bottleneck to parallelism (and can also trash the caches of the other cores). I don't know all the specific technical details of GHCs GC, though.

Another problematic thing is laziness, which is operationally control flow plus side effects. Those are exactly the things that make efficient parallelism difficult.

> Although a concurrent garbage collector has recently been merged to the nightly GHC.

A concurrent GC is a different thing, that interleaves garbage collection with ordinary computation. That can be very useful, but it doesn't help with exploiting more cores.


Yep Erlang is fairly procedural as a language in a sense. I don't think being functional was ever a priority, more like a side-effect. I don't know how mailboxes and message passing would really work in a purely functional language anyway, could be awkward to work with them. Particularly with a functional (ML or what have you) type system (how to type mailboxes?).

> it doesn't help with exploiting more cores.

Yeah you're right, I believe the aim is to get rid of those garbage collection pauses for better concurrency (not parallelism).

From what I understand, GHC can do pre-emptive or cooperative scheduling. Pre-emptive being better for concurrency, just like Erlang does it.


> how mailboxes and message passing would really work in a purely functional language anyway, could be awkward to work with them.

It works really well and intuitively I would say, and even more; this is a common application to show the advantages of functional programming. Here is an article comparing sending messages asynchronously between threads in Elixir and Clojure (two common functional languages) https://bentomi.github.io/posts-output/2017-06-04-message-se...


Elixir and Clojure are not pure functional languages at all. They allow you to perform side effects freely, such as sending messages to different threads. They're not reflective of what the actor model would look like when implemented in a more purely functional language.

Moreover, I was mainly referring to using the actor model in a language with a static type system of functional languages (ML). What would be the type of the mailbox? The message? I believe even Facebook (Whatsapp) has struggled to find a solution to this question, they're developing a statically typed language on top of BEAM right now.

There's been plenty of good discussion around implementing a strong static type system around BEAM, here's some bits for reference: https://elixirforum.com/t/how-hard-would-it-be-to-have-a-str...


What you heard is correct. If you have pure functions and provided, that you have working primitives of concurrency, you can go to any place in you code, where you call a pure function and change the code, so that the function runs in a new concurrency (or parallelism) primitive. For example in a new thread or process. The places where this is useful can be discovered by profiling.

Another advantage is, that you can unit test each function on its own, without setting up a whole host of preconditions. That's because of referential transparency.


You are confusing Moores' Law with Dennard scaling. Without Moores' Law you're not adding more transistors and therefore more cores.


I'm just an idiot EE who learned to program, but it seems like functional languages without side-effects accomplish their task by basically working on data structures stored on the stack. That basically means each call stack has its own data structure. If you wrote threaded code so that each thread had its own data structure, and worked off a shared set of read-only data structures, you wouldn't need to worry about locking.

IME locking isn't really all that hard until you need to squeeze out performance. If you can handle one big dumb lock around everything, it's easy. The finer grained your locking gets, the harder it is to get right(even then, lock hierarchies can help to a great degree).


One big dumb lock only protects against data races. There are other classes of concurrency bugs that require smart locking to avoid, and it's always a pain.

For example, iterator invalidation. Which is probably the most common thing I've seen when people use dumb locks to make a concurrent program sound. Either your iterators need to be aware of the lock and hold it through their lifetime (which means practically only one iterator can exist, which is undesirable) or your datastructure needs to be aware of outstanding iterators to it and update them (which means you have added logic to simple mutations, which is also undesirable).

It's problems like that you look for in non-functional languages that require library authors to say things like "this is/isn't thread safe" in their documentation, while providing those guarantees manually. And as a consumer of the library you need to be aware of these things and look for them, and what an error looks like when someone messes up.

I don't know what you're trying to say about stacks/the call stack in functional languages, because they implement their patterns in variety of ways - including locking. It's just about providing particular guarantees in the API, and the benefits of those guarantees are that it prevents entire classes of bugs like iterator invalidation and data races.

There are also all sorts of clever data structures you can use that don't require duplication of the underlying data. The easiest is a singly linked list without insertion. No locks required.


The problem is that if you have one big dumb lock around everything then you have no actual concurrency or parallelism.


Depends on what the big dumb lock protects. You may still be able to process data you accessed while holding the big dumb lock.


> Could someone with more experience comment on that? Is concurrency really easier in functional languages and is the rising importance of concurrency a valid reason to look into functional programming?

Only you know if concurrency is relavant to your work. A lot of internet stuff is deeply concurrent, but there's a lot of sequential work out there, too.

I have experience with Erlang. It's amazing for concurrency, but the reason is really not anything it gives you, but everything it prevents you from doing, or requires you to do.

Erlang's immutability is important because it means you can't save a reference to something and have it magically updated; if you want a value updated somewhere, you need to send the current value or request the current value at time of use; explicit synchronization makes concurrency clearer, and helps influence system design to reduce synchronization. Immutability also makes message passing easier; the content of a message can't change after it's composed which means the implementation is free to make a copy of the message if it's convenient and as a programmer, there's no need to consider the effects of changing something that you sent to another process. Immutability also makes garbage collection very simple; it's impossible to form a looped datastructure, so there's no need for mark and sweep, a simple copying collector is sufficient. Individual stacks per Erlang process (like a green thread) means GC pauses don't stop the world, only individual processes, and pause length is bounded and proportional to the memory use of the process.

You could certainly set up a similar environment in C or whatever traditional language; the problem would be enforcing your system design on your own code, and partitioning off library code (including libc and/or kernel syscalls) to ensure it doesn't impact your system design either.


I don't have more experience, necessarily, but I think the argument builds on the facts that pure functions don't have side-effects, which avoids many of the problems of simultaneous data editing, and also the use of immutable data structures, which can be shared freely (in a read-only manner).


> Is concurrency really easier in functional languages

You can use the same approaches in non-functional languages. You can only use immutable data structures, and pure functions everywhere. Then you get the same benefits of easier reasoning about concurrent execution, as you get from a functional programming language.

So the point of using an actual functional language, is that the compiler pushes you very strongly towards programming this way. It's kind of like using a language with built in garbage collection. You could implement a garbage collector in C, and use it everywhere in your program where you allocate and release memory. But having it build into the language frees you up from having to think about it very much.


I don't get the immutable data structures thing though. If I have a vector of 1 billion numbers and I need to sort it it'll be unfeasible to make a copy every time I swap elements. RAM isn't infinite either.

So, deep down in the guts of the language's libraries there must be mutable operations. Which kind of defeats the whole point.

I don't get point of functional programming. Machines don't work that way.


Functional programming builds immutable semantics as an abstraction on top of mutable semantics. This abstraction layer includes internal optimizations that elide unnecessary copying, such as persistent data structures and "list fusion." When you're processing a list, the compiler can optimize away intermediate sublists. If you "copy" a list or sublist in Haskell, you don't actually get a copy, you get a new pointer to the same physical data, which is safe because it's immutable and garbage collected.

A key premise of functional programming is that the constraints of immutable semantics free the compiler up to do all sorts of performance optimizations, and the compiler can apply those optimizations much faster than a human developer can.

Also, when you really need a vector that you can mutate in place, you can still do that. Haskell has a mutable vector type. It's not the default idiom, but if you needed to sort a billion numbers in place, this is what you would use. Note all the function names prefixed with "unsafe": https://hackage.haskell.org/package/vector-0.12.1.2/docs/Dat...


>I don't get point of functional programming. Machines don't work that way.

FP is useful because it provides an abstracted model that is pretty easy to reason about. It doesn't matter how it gets implemented, that's the point of the abstraction.

We don't write programs for machines to understand. We write programs for humans to understand.


> We don't write programs for machines to understand. We write programs for humans to understand.

True, but that program ends up executing on a machine. You normally want that program to execute as efficiently as possible. 'Efficiency' can be measured several ways... time to complete, resources required, financial cost.

A case in point is our company switched from using Cassandra (JVM based) to ScyllaDB (a C++ clone of Cassandra). Query latencies were the same if not faster, and we needed fewer nodes to handle the same load. All because the language used has some mechanical sympathy for the CPU it's going to run on, and no GC.

I see Scala is somewhat the defacto language used for 'big data' stuff. I have no idea why. I reckon if some bean counter translated that choice into $$$ things would change.


> True, but that program ends up executing on a machine. You normally want that program to execute as efficiently as possible. 'Efficiency' can be measured several ways... time to complete, resources required, financial cost.

No, you normally want that program to execute efficiently enough to do its required job. Abstractions are often necessarily costly, but that's ok because their value is greater than their cost. Otherwise, you'd just throw all the abstractions away and use assembler.

> Query latencies were the same if not faster, and we needed fewer nodes to handle the same load. All because the language used has some mechanical sympathy for the CPU it's going to run on.

It seems like there's a whole load of reasons why it might have been more efficient on ScyllaDB that are unrelated to what language it was written in. Maybe the software design fit your requirements better? Maybe it was just better implemented? Maybe it was easier to configure for your load? Etc. etc. Programs aren't magically "more efficient" because they're written in C.


> No, you normally want that program to execute efficiently enough to do its required job

but... the "required job" is to go as fast as possible - in two ways :

1/ You must go faster than your competitors. Else people will switch, your software will get bad reviews, etc.

2/ you must go fast enough for anything that people will throw at you. And, in many cases, people can throw arbitrarily large problems to your program, and telling them "sorry but it will take 1 hour to process your 2TB dataset" just won't cut it at all. And even when you manage to cut it, then people come at you like "now can this run on a raspberry pi?"

Take for instance image processing software, audio or multimedia software... they will never be fast enough, and there is always an use case where getting a 1% perf improvement will allow a project to make instead of break (especially when working with uncompromising artists).


> but... the "required job" is to go as fast as possible - in two ways :

This is sometimes true, but hardly ever. Most software is not image/audio processing software. Most software is not even interacted with directly by humans. Most software exists as a component as part of a larger system, the performance of which is typically determined by a small fraction of that system. In other words, most software is not on the critical path.

Even if you're on the critical path, the performance requirements could be of the form "Here's some data, I need an answer in an hour". If your current system spends 2 minutes computing and delivering the result, why spend time and money making it faster? Just because you're on the critical path for you, doesn't mean you're on the client's critical path.

This is the whole point of performance engineering - identifying which bits are important and then figuring out how to optimise those. Optimising everything to run as fast as possible is a waste of time and effort - worse, it often degrades the overall system by making previously easy-to-understand components more complex and more difficult to reason about. See the whole "Worse is better" thing.

If you're on the critical performance path, go nuts. Throw away the abstractions and take it down to the metal. If you're not? Why bother?


Perhaps the 'I need answer in a hour' is currently acceptable because that's how long people are used to waiting for the answer. If they knew that you could get that answer in 2 minute do you not think they'd ask for that instead?

Like I mentioned previously, Scala's use in Big Data is perplexing. I reckon Cloud providers are making an absolute mint from design / language inefficiencies.


> the performance requirements could be of the form "Here's some data, I need an answer in an hour"

I have never been in the case where the answer to the question "when" wasn't "as fast as physically possible".


That works if an abstraction doesn't leak.

Since we are not quite there yet, could you explain in broad terms how a large collection is sorted while maintaining performance and immutability.


Here is a worked example of how to implement radix sort using only immutable data-parallel operations: https://futhark-lang.org/examples/radix-sort.html

There will of course be "impure" writes at the machine level, but at the language level, the abstraction doesn't leak.


> That works if an abstraction doesn't leak.

It seems like you're saying that abstractions are worthless unless they're perfect, which is obviously not true.

> Since we are not quite there yet, could you explain in broad terms how a large collection is sorted while maintaining performance and immutability.

What do you mean by "maintaining performance"? Relative to what? Who's asking for the sorted collection? How are they going to access it? What are the resource constraints? Give me a real-world example.

Don't get me wrong, I'm not saying that FP is the answer to everything. It's a useful tool, and part of being good at software development is knowing what tools to use and when.


No need for a real-world example or an exact definition of performance or efficiency, just make up something sensible.

How can efficient sorting be implemented with immutable data structures? Genuinly curious.


The simplest way, but which essentially dodged the question, is to utilize a language with uniqueness types. Where the data structure may only exist in a single place and there can be no aliasing. Then any operation on the data structure occurs and can be done ‘in place’ returning the new unique data structure. This is similar (but not identical) to linear and affine type systems, of which Idris is the only language I know of that actually implements such a type system.

Also, along the same lines, it is certainly possible to craft types in some FP languages which allow for this same logic to be captured, mostly via some monad construct.


Rust has an affine type system, and https://docs.rs/im/14.3.0/im/ implements this https://docs.rs/im/14.3.0/im/#in-place-mutation


This reply is late, so if you don’t see it or care I don’t blame you.

To start, I am aware of Rust and left it out of the above comment purposefully. While Rust does indeed make many ‘functional programming’ idioms available and, some would say, is basically an impure, eager functional language, I don’t tend to view Rust as a language featuring persistent, immutable data structures. I certainly don’t think that is a negative characterization, just how I see things. I find that one of the defining features of the functional programming ‘persistent’ data structure is the structural sharing that is all too frequent in most of the mainline FP languages (Haskell, ML, Clojure, etc.). Rudy’s focus on low level/systems level programming is much more likely to use standard imperative data structures without the sharing and deep pointer chasing style seemingly inherent in FP implementations.

I have had a few spirited web-discussions with people about the claim that Rust has an affine type system. I don’t think that stance is accurate, from a type theoretic stance. I subscribe to the theoretical standpoint that Rust has a standard non-substructural type systems with constraints placed on types that meet a mutability predicate. I think that those constraints are better described as a uniqueness constraint per instance of a type. But the sequent calculus presentations for Rust (especially concerning mutable data) would look very similar and act very similar as well. So really it’s much ado about nothing, just fun to talk about.

Sorry for the wall of text, and rambling about type theory. Stay safe.


The key idea is that there are functional data structures that only require you to copy a small part and reuse a lot of the old. For example, if you have a binary tree and you want to swap out the root value, you create a new root node that simply uses the existing children.

>So, deep down in the guts of the language's libraries there must be mutable operations. Which kind of defeats the whole point.

Why does it defeat the point? If you can guarantee that the mutability doesn't leak (there are ways to do this), it behaves just like a pure implementation.


The implementations of immutable data structures in most concurrency aware functional languages are smarter than that. Take Clojure as an example: "All of the Clojure collections are immutable and persistent. In particular, the Clojure collections support efficient creation of 'modified' versions, by utilizing structural sharing, and make all of their performance bound guarantees for persistent use." https://hypirion.com/musings/understanding-persistent-vector...


Mutability is an access pattern and not an underlying data layout.

In fact, some structures that lend themselves to immutability represent the hardware better. Take your vector of a billion elements. It probably is too big for L1/L2 cache. Is it actually a billion things next to each other in memory? Not at all.

If you used something like a radix balanced balanced tree to represent your vector, you get amortized O(1) access and mutation can be implemented using copy on write at individual leaves of the tree. And your abstraction is much friendlier with the cache, because it reflects how the hardware works much better than a simple vector.

In practice, immutable data structures can be much more performant at scale than pretty much anything. They're not just convenient for soundness. Trivial data structures like vectors are really only useful when you're talking hundreds to the low thousands of elements, at which point who cares how much memory they take up?


IMHO the advantages of functional programming is often looked for in examples that are too small for them to exist.

There is no advantage in sorting a billion numbers the functional way.

There is an advantage to gain, though, when the unsorted numbers are used for other purposes besides sorting. Then, functional programming demands that the output of sorting is written to a new 1-billion array instead of "in place", so other clients can consume the same array in parallel without being disturbed by the sorting.

> So, deep down in the guts of the language's libraries there must be mutable operations. Which kind of defeats the whole point.

It doesn't defeat the point if the abstraction doesn't leak. If a pure functional language offers sorting as a pure primitive, then language-wise it doesn't matter that it actually mutates memory internally.

The problem with "mostly" pure functional languages is that the abstraction leaks enough to make it useless.


> The problem with "mostly" pure functional languages is that the abstraction leaks enough to make it useless.

Does this make e.g. erlang useless? Processes have mutable dictionaries and you can perform I/O anywhere in your functions..


No machine works like the high level languages, including C.

Many believe the portable Assembler story and aren't aware that ISO C is defined in terms of an abstract machine.

The abstraction level that functional programming allows for, all the way back to Lisp, is to take advantage of the underlying hardware without being forced to hand code lowlevel details.

Something written in ML like language could be running on 70's, 80's, NUMA architectures, GPGPU, whatever, without changing one single line of code. Naturally we are not speaking about the same runtime implementation.

Whereas a language like C, requires a bunch of dialects to actually make use of the low level details, only exposed via Assembly and not part of ISO C.


Yeah but those "low level details" are not portable between computer architectures, so why not stop whining.

And you can actually USE these low level details if you really want, with very low friction (if you are willing to lock down to specific architectures). Are you STILL whining?

C level (or Pascal if you're masochistic) is still a very nice abstraction, and for a lot of people it's everything they care about 99% of the time, to get all the control they need, while simultaneously ensuring they're not wasting any time fighting pointless language features that are mutually incompatible.

In the end everybody's using whatever they like. https://twitter.com/fabynou/status/1242517783092400129


As long as people are lyabble for the security exploits they produce I am all for it, hammer them hard with lawsuits.


Sorry, but that's a rough shifting of goalposts. The statement that I argued against clearly was "C is not low level". You might not have noticed that you shifted, because you have an agenda.


I have always been quite clear about it, it is even part of my profile.


But please let's stay focused and keep discussing specific points. When changing topic, we can make a new subthread.


Fair enough.


Not defending fp but if memory is a limitation why not create a sortable index and leave the data alone?


Yes. I can say that doing concurrency in Clojure is particularly easy, check for yourself: https://www.braveclojure.com/core-async/


Does anyone know the history behind the distinction between concurrency and parallelism presented here? The most frequent reference I see is Pike's "Concurrency is not parallelism" talk, but I'm curious who first came up with this distinction.


More food for thought, as I've been thinking about this:

1. std::thread::hardware_concurrency

This is the number of threads that can execute in parallel, no?

2. "Memory-level parallelism"

How many memory operations can be "outstanding" at once - seems comparable to a single core issuing multiple disk reads. The memory operations aren't really serviced simultaneously, they just have overlapping lifetimes.

For more fun, some people refer to the case where performance is limited by the amount of memory-level parallelism available as "concurrency-limited": https://sites.utexas.edu/jdm4372/2018/01/01/notes-on-non-tem...


You can have concurrency without parallelism per the definition of the article - on a single processor system with timeslicing, for example.

SIMD systems effectively give you parallelism without concurrency - only one instruction is executing, but it's operating on multiple dataflows.

Your linked definition of "concurrency limited" seems to refer to utilisation. In the scenario described, how effectively the processor can be utilised depends on how many concurrent tasks it has in progress so it has something to do while one of them is waiting for a cache miss.


> The sched_yield() puts the calling thread to sleep and at the back of the scheduler’s run queue.

Not necessarily, but it is fine for this purpose I suppose. See https://news.ycombinator.com/item?id=21959692

Glad to see lock hierarchies mentioned. Barriers are new to me so that was nice.

IMO, it would be nice to at least have a mention of lock-free techniques and their advantages and disadvantages.


The main disadvantage of lock-free techniques is that you have to write code without any critical sections, that is, you have to properly manage arbitrary interleaving of actions. It's hard enough to manage staleness/inconsistency of data at high (business logic) level, never mind the low level where the code is not even executed in the written order.


Also lock free usually means atomics which means memory fences. Those can be slow. It's another tool you should have available though.


Memory fences and atomics are a part of lock-based code as well. They're just hidden by the locking and notification primitives.


Regarding sched_yield, another common (and, IMO, superior) approach to that issue is to drop the first lock and then acquire the second lock after the try-lock on it failed. Then you drop the second lock and retry the whole sequence again.


Agreed that sched_yield() is unlikely to be the right approach. A better approach is to use nanosleep() with an exponential backoff.


There is a good body of knowledge around dealing with concurrency issues within a single process. We've tools (locks, semaphores ...) to deal with the complexity as well as programming paradigms which help us write code which minimizes data races. It's interesting to realize that in a world with an increasing number of micro-services manipulating shared resources (a shared database, shared cloud resources), or even multiple nodes backing a single micro-service all reading and writing to shared resources, similar concurrency bugs arise all the time. Unlike a single process where you can use locks and other primitives to write correct code, there is no locking mechanism we can use to protect access to these global shared resources. We have to be more thoughtful so we write correct code in the presence of pervasive concurrency, which is easier said than done.


> Unlike a single process where you can use locks and other primitives to write correct code, there is no locking mechanism we can use to protect access to these global shared resources.

Databases provide transactions. This mechanism is also an inspiration for a synchronisation model called Software Transactional Memory proposed for Haskell, and used as "the" synchronisation model in Clojure. Locks and semaphores are rather lower-level primitives and it's harder for humans to reason about them with an ease comparable to using CSP or STM.


Yes, database transactions should be heavily leveraged wherever possible. We've often had to write services which create multiple resources in response to user requests. As an example, create an entry in the database and trigger the creation of, say, an Azure storage account. Transactions across independent services and resources don't work and correctness requires thoughtful design. In the more general case, whenever your service talks to more than micro-service to complete an operation, you will probably have to think through issues of consistency and transactionality.


Is anyone aware of good examples that can be used to explain and implement parallelism/concurrency that are not the bankers? I have seen it too many times.


The dining philosophers problem comes to mind, which originally was formulated by Dijkstra [0]. You can find implementations in different languages at Rosetta code [1]

[0] https://en.wikipedia.org/wiki/Dining_philosophers_problem [1] https://rosettacode.org/wiki/Dining_philosophers


Thank you! Actually I saw that too.

I was hoping for more real life examples where you can see the effects of concurrency parallelism.


No mention of volatile variables or the concept of stale cpu cache reads when a value is written to from another core. I think its a pretty common and fundamental concept that should be in a write up such as this.


If you use the standard blocking synchronization primitives, you cannot have stale reads. If you don't, the right way to introduce them would be with the C11 memory model relationships (synchronizes-with, happens-before), volatile shouldn't be touched with a 10 foot pole except for synchronization with signal handlers.


> stale cpu cache reads

Every multiprocessor system nowadays has coherent caches for normal memory, meaning that if you have reads that you perceive as stale, it isn't because of the cache, but because the hardware doesn't guarantee sequential consistency.

You can have problems due to relaxed consistency on a system without caches, and the problems on systems with caches aren't due to the caches.

There are situations when you have to worry about lack of coherency for other memory types, but AFAIK these are rarely exposed to userspace.


I agree it would be nice to add. Is that something you need to worry about when using the standard synchronization APIs though? They all handle memory fencing for you, no?


My experience is, single threaded execution and being able to replicate that with local data for each instance and remote lookup at a fine grained data level when needed, is a more easier way to maintain the code. Cocurrency and all those synchronisation is damn hard to code and debug.


In my opinion it's overhyped. I'll probably take point hits for claiming it, but so be it. Let the truth ring.

Why copy techniques meant for Netflix and Facebook when your org or app is most likely 1/1000th their size. Phallic size jealousy at work.

Most concurrent and parallel work can and should be done on a true-and-tried RDMBS for most orgs and apps. Use transactions/rollbacks properly and let the RDBMS manage most the grunt work instead of reinvent the wheel in app code.

K.I.S.S. and use-the-right-tool-for-the-job.


If you want extreme scale, most data in memory, such as a search engine processing millions of documents running data pipelines, then RDBMS isn't the way to go. For other MVC types, for a reasonable scale out, straight forward models with RDBMS should do.


Note that existing RDBMS are gradually adding and improving their text search engines. Of course there will always be specialized situations that need dedicated high-end text search engines.


It has tradeoffs and can easily result in higher complexity, eg if you need a shared cache to minimize latency/memory.

I like the attitude the article takes. To successfully use concurrency, you should understand the tools available to you and their tradeoffs so you can make the right decision for your requirements.


When one has nice tooling like Visual Studio graphical debugging, not so much.



Thank you for the great read! Wondering how io_uring would be put in place of this situation...would be very interested in the authors review: https://kernel.dk/io_uring.pdf


Interesting article and a great blog


I'm a bit disappointed that the article doesn't explain the need for a memory/consistency model and how it interacts with CPU caches. Locks are the easy part, and the article makes you think that with them you can now write at least simple concurrent programs.

Why is that? I'm pretty sure that the author's intention is not to equip the readers with the tools to make buggy programs, yet that is exactly what happens here.


Don't the standard synchronization APIs documented in the article handle memory barriers for you?


Yes. As long as you use correctly-constructed synchronization primitives (e.g. pthreads), you don't need to worry about memory consistency.

When you need to start worrying is when you start implementing your own synchronization (either rolling your own primitives or going lock-free).

'moring needs to clarify what they are talking about. It's perfectly possible to write correct code using pthreads on modern hardware with no understanding of memory consistency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: