Hacker News new | past | comments | ask | show | jobs | submit login
Erlang's not about lightweight processes and message passing (github.com/stevana)
543 points by stevan on Jan 27, 2023 | hide | past | favorite | 274 comments



OTP came much later than the design of the language.

The language was designed around our I ideas of what the problem really was and the best way of solving it. The massive and extremely lightweight concurrency were a critical part of attacking the problem which was/is extremely concurrent. There are an enormous number of things going on in a switch which have to be handled concurrently, sometime over 10k calls plus running the switch. So the concurrency was fundamental. The error handling primitives were of course also a a critical part of the solution as fault tolerance was a critical part of the problem.

A lot of effort went in to working out what the concurrency and error handling should look like to be right for how to attack the issues and solve the problems. Fortunately we worked with a user group looking at designing a new architecture who tested using our ideas and came with extremely important feedback on the ideas, what was good or bad or just not necessary. They then became the first group to build a product using Erlang. It didn't use OTP as it didn't exist then.

OTP came later and is “just” a formalised, generic wrapper around these ideas. We had client-servers, state machines and the equivalent to supervisors in products before OTP was developed. Behaviours came with OTP. And in the beginning there were many different versions of OTP alive at the same time.

Behaviours could not have been developed as they are without lightweight processes and communication. OTP is of course fundamnetal to the distribution and use of the language as it does encapsulate most of of our original ideas in how Erlang should/could be used to build systems.


It would be really interesting to know what sort of things the Erlang team considered but then dropped as bad ideas!


I found this page on one of the LFE books that explains in brief the original problem and constraints

https://lfe.io/books/sicp/fm/preface-3/erlang.html


I think it is misleading to frame this as "behavior". I think what we really have are battle-tested concurrency patterns, which then uses the language feature called "behavior" to implement. This collection of battle-tested concurrency patterns that has been in use for a long time, in a variety of settings, is called OTP. It's why you'll hear long-time Erlang folks talk about how, it's really about OTP.

You can implement those same patterns, if you also have lightweight processes, messages, and message queues. There's a Rust library that aims to do exactly that. The new WASM runtime (Firefly, formerly known as Lumen) aims to bring those patterns into the WASM ecosystem, potentially using this kind of concurrency pattern client-side.

What's potentially novel is adding in newer concurrency patterns that are in use in the Kubernetes community -- use of selectors, services, scaling with replicasets -- so that, instead of a static supervision tree, it can be a dynamic supervision tree, spread out across the entire pool, even as that pool expands or shrinks.


I started using worker threads for some batch processing and development tools recently. We have a couple of tricky bits that pull data from different environments to compare them, so the code has to run 'as' two or more environments and forcing a state change was bug ridden as hell, and spawning a child process was slow enough that we limited the scope of how much we scanned (running n worker threads for n environments dropped a lookup from 1.5s to 350ms).

Armed with those successes I turned to thinking about applying something similar to a production Node app, and every time I think about it I realize what I really want is OTP. Each type of processing we do has its own caches, and its own CPU uses, which could dovetail well with worker threads, but I have a chicken and egg problem with trying to move any of that work out of process because I don't get any benefit until 1) none of the workers can silently crash and leave the application nonfunctional and 2) before I'm handling half a dozen concurrent requests per Node process instead of one or two. and 3) before message passing overhead is cheaper

It's either too late or too early to rewrite this whole thing in Elixir.


Perhaps Node is the problem, rather than Elixir being the solution?


Sometimes your problem is that you're using a reciprocating saw when you should be using an angle grinder, sometimes your problem is that you bought Ryobi instead of Milwaukee or Makita.


Indeed, but sometimes you just need to use the right tool for the job. That could be a precision guided flying laser chainsaw axe. Usually it’s a bog standard trusty dumb hammer that everyone else manages to bash similar looking nails with quite effectively.


If node is the issue here that would make elixir a solution, no? Changing languages (i.e. node is the issue) to get the features described would put you looking at elixir (among other languages that can fill the described properties).

Maybe what you mean is that elixir isn't the only solution?


Yeah, that’s what I meant: “elixir being the (only) solution”. Guess it was worded ambiguously.


Would running Nodejs in WebAssembly along with something like Firefly or Lunatic help make the transition smoother?


Or… use the right tool - Elixir. Monoglotism is bad, people.


Here's an attempt at implementing OTP in Go: https://github.com/ergo-services/ergo


A used this a couple of times in production: https://github.com/asynkron/protoactor-go.

No problem launching a 100k actors on a laptop.


That looks cool. Thanks for the link!


Why do you say "attempt"? Does it not work sufficiently well [yet]?

Edit: Thanks for the additional context, dangoor. Cheers.


I haven't tried it, it's just on my radar. It may be great! But OTP has been around a long time, so I'm not going to assume that they've nailed it unless I've either tried it myself or seen folks talking about using it successfully.


> This collection of battle-tested concurrency patterns that has been in use for a long time, in a variety of settings, is called OTP.

OTP seems like a pretty overloaded acronym—what does it mean here?


OTP in an Erlang context refers to the Open Telecom Platform - a bit of a misnomer since you can do a lot of things with it that do not inherently relate to telecoms, but that's how it evolved.


For context, Erlang itself was originally created at the telecom company Ericsson — the name is short for "Ericsson Language". But as concurrency and networking became bigger parts of software in general, the qualities that made Erlang good for telecom software turned out to be generally useful.


I think the name was a play on both "Ericsson Language" and the name of Agner Krarup Erlang, who pioneered queuing theory (before it was known as such) to address the fundamental question of inter-exchange capacity planning.

The unit "erlang", for "1 busy circuit/resource" is named in honor of that research.

https://en.m.wikipedia.org/wiki/Agner_Krarup_Erlang


> the name is short for "Ericsson Language"

But just for fun, it's also the name of a Chinese Buddhist god: https://en.wikipedia.org/wiki/Erlang_Shen


Or Outlaw Techno Psychobitch for the Bananarama of Languages. https://www.youtube.com/watch?v=rRbY3TMUcgQ


Open Telecom Platform, I think?


> It's why you'll hear long-time Erlang folks talk about how, it's really about OTP.

Indeed, this didn't come as a surprise because I've heard Erlang folks rave about OTP. What was really nice about this article (complemented by some of the comments here) was that it gave me a really nice intuition for what OTP really is, beyond "supervisor trees, y'know".


Please, can you share the rust library??


I'm the author of something related https://github.com/lunatic-solutions/lunatic


I don’t remember the name and I don’t remember if it is tokio or something else. However, I did find this: https://docs.rs/genserver/latest/genserver/

I will keep looking. Keep in mind, BEAM uses a preemptive actor rather than async, so it avoids the “storms” that can happen when an async reactor runs out of resources. EDIT: but I guess Tokio is capable of preemption through work-stealing too?


This looks interesting too, though no OTP out-of-the-box: https://docs.rs/axiom/latest/axiom/


Can you say more about these storms? Sounds pretty interesting. How does preempting solve it?


I don't know Python or Rust well, but I think this is the problem:

https://betterprogramming.pub/the-dangers-of-async-in-python...

And this is how some people are solving it:

https://async.rs/blog/stop-worrying-about-blocking-the-new-a...

This is how it works in BEAM:

https://github.com/happi/theBeamBook/blob/master/chapters/sc...

The commonality is to limit execution time and yield (preempt) so something else can run too.


Not sure this is what GP is talking about but to implement the actor model in https://letlang.dev I use tokio.


There's a component that seems to be missing here which is preemptive task scheduling. I've not seen another non-OS system do it like the BEAM VM (the VM behind Erlang), though there may be something I'm not aware of. It really prevents a whole class of concurrency issues where a hung process can freeze or slow down the entire system.

For example, if you recreated gen_server in a cooperative concurrency environment, one gen_server could use up all of the CPU and have an impact on the performance of the rest of the system. Maybe the other threads (microthreads, not OS threads) would still respond in <500ms, but if every request takes 500ms when they normally take 15ms you could essentially have outage-like conditions, particularly with upstream timeouts.

Instead, because BEAM is preemptive that one (or 10) hung gen_server doesn't hang up everything else on a node. Sure at some point performance will degrade, but that point is much further down the line than in cooperative concurrency models. There was a fantastic talk by Sasa Juric that demonstrates this in Erlang. [1] Otherwise you run a higher risk of even the supervisors being starved for CPU time, particularly if you are launching hundreds of processes that are all locking the CPU.

It's really the combination of the behaviors (OTP), the scheduler, lightweight threads, message passing, and immutability to me that makes the Erlang (Elixir for me) concurrency model so appealing.

Creating a language with the feel of a lisp, the environment of Smalltalk, and the concurrency of Erlang has been my dream for a long time.

[1] https://www.youtube.com/watch?v=JvBT4XBdoUE


> Creating a language with the feel of a lisp, the environment of Smalltalk, and the concurrency of Erlang has been my dream for a long time.

Hey! Now there’s two of us! It would be interesting to compare notes. :)

After 20 years of really dedicated Smalltalk evangelism, I jumped out of that balloon a little over 10 years ago. Then I wandered in many strange lands embracing the polyglots “right tool for the right job” mantra. What I found was more like “here’s a lot of really mediocre tools; rarely is it crystal clear which if many is right for the job.”

A year or so ago, I built out an API server in Elixir (no Phoenix) and I’ve really loved it. Great community. I love that it’s built on basic fundamental principals, and not a bunch of edge cases. I’ve always wondered what some sort of mashup would look like. If I was independently wealthy I would tinker away at such a thing.


"out of that balloon" -- I see what you did there. :)

I've not really thought about it beyond that it would be nice. I really wanted to love Smalltalk, but Pharo at least didn't run well on Arch for me, and I didn't much love the web frameworks/concurrency story. I will say that Seaside is probably what got me interested in Liveview in the first place though. So many "if only"s...the developer experience is amazing in Smalltalk though.

There's a lot that scares me about making languages and I've not studied it much. I've started reading through SICP though as a starting place. Writing a basic scheme interpreter (and future compiler/JIT compiler) as an image-based language (with change tracking) to make it interactive from the get-go seemed to be as far as I made it, maybe basing it on an existing Scheme, but tacking on preemptive scheduling sounds hard. But so is making a language and entire development environment. :)

Shoot me an email though if you'd like to chat (in my profile, also username@gmail.com), not 100% sold if this is something I'd want to take on in the future or not, I really wish it already existed and someone better than me had already made it. :-D


>> Creating a language with the feel of a lisp, the environment of Smalltalk, and the concurrency of Erlang has been my dream for a long time.

Have you looked at LFE (Lisp Flavored Erlang), by one of Erlang's co-creators, Robert Virding? (No Smalltalk-like environment, but 2 out of 3 ain't bad, right? :-) )

https://lfe.io/


>> Creating a language with the feel of a lisp, the environment of Smalltalk, and the concurrency of Erlang has been my dream for a long time.

>Hey! Now there’s two of us! It would be interesting to compare notes. :)

Please make this happen. :)


Sure. Just need that “independently wealthy” thing to fall in place. Are you donating?


I have a cunning plan to become independently wealthy. Should come to fruition real soon now. :)

Perhaps an IDE for LFE would be a place to start? Does such a thing exist?


> Creating a language with the feel of a lisp, the environment of Smalltalk, and the concurrency of Erlang has been my dream for a long time.

I'm trying to eventually accomplish something like this: https://github.com/sin-ack/zigself

It's an implementation of the Self programming language in Zig, with an actor model inspired by Erlang.

The main thing to realize is that Lisp and Smalltalk are very much symmetrical in terms of structure. There is no real distinction between the two other than syntax and basic computation unit (closures vs. objects). And even closures can be used as objects and vice versa.

That only leaves the concurrency model. I have a basic implementation of actors using objects as the "context". It still has a long way to go to reach the supervisor tree model of Erlang, but interestingly enough, the ideas in the article are reflected here heavily; behaviorism is at the core of Self.


> There's a component that seems to be missing here which is preemptive task scheduling.

For Erlang, yes. For implementing behaviours (the point of my post), I don't think so (I sketch a "single threaded" solution towards the end).

I think one worker behaviour per CPU/core per stage in the processing pipeline is better than throwing thousands of processes at the problem and let the scheduler deal with it. This is what I got from Martin Thompson's talks (I linked to one of them in the "see also" section).


I wrote a preemptive 1:M:N scheduler in C, Rust and Java.

https://github.com/samsquire/preemptible-thread

It is a 1:M:N scheduler where there is one scheduler thread, M kernel threads and N lightweight threads. I take advantage that loop indexes can be structures and can be modified by other threads. So we can set the thread's looping variable to the limit to end the current loop and pause it and then schedule another thread.


Thanks for pointing this out. It is very important. The preemptive task scheduling is the secret sauces that makes Erlang unique. For example, if you look at how Akka implements the actor system, you'll see it riddled with the Future and/or async/await patterns which is due to the non-preemptive java scheduler. After playing with Erlang, this hack becomes a irritating.


If I'm not wrong, Go added uncooperative task scheduling that behaves similarly?


I'm not a Go expert but preemptive scheduling of goroutines is supposed to be one of the primary benefits of Go as a language [1]. Although I'm not sure if there's any runtime that adopts preemptive scheduling as fervently as the BEAM. Even the regular expression engine is preemptive in Erlang, which prevents regular expression denial of service attacks [2].

[1] https://go.dev/src/runtime/preempt.go

[2] https://www.erlang.org/doc/man/re.html#:~:text=run/3%20alway....


Regex DoSes can be made impossible without using any concurrency. See Go's and Rust's implementation. Also see https://swtch.com/~rsc/regexp/


I wasn't trying to imply that concurrency is the only way to mitigate ReDos attacks, I was just pointing out that the BEAM applies preemption seemingly everywhere, which I found interesting.


The BEAM misbehaves when a scheduler gets stuck in a process (or other activity) for too long. As a result, anything that can take a long time needs to have yield points. This is a human process; when I started using Erlang, garbage collection didn't yield, List1 ++ List2 took forever with large lists and didn't yield, we didn't have line numbers in backtraces, and we had to hotload our code uphill in the snow both ways. It's not unusual for a new OTP release to have added new yield points in BIFs or NIFs that could run long or even in core VM workings.


There was someone recently on HN who didn't know about the possibility of non-backtracking implementations of regex.


It wasn't non-cooperative pre-emptive scheduling for a while, which is why I kept pushing Elixir, until Go implemented it. Now Go has the same benefit as Elixir, but it's very, very fast.


Yeah with preemptive scheduling, static typing, and better cpu utilization Go is a pretty compelling option for Erlang and Elixir users.


It's not immutable, there is no messaging, no hot-swap, no supervisors, it's not FP, and it doesn't have transparent support for clustering.

Also, it's not memory-safe, 30ys old, and battle-tested pretty much everywhere.


There are definitely trade offs when switching from Elixir or Erlang to Go. If you're a functional programmer who can't live without immutability, or you plan on running a cluster of machines that can communicate with each other and hot-swap code into the running system, then Elixir and Erlang are good choices.

If you have some extremely CPU intensive code, or you like cross compiling to a lightweight binary, or you need static typing (without giving up preemptive scheduling), Go is a decent choice.

Elixir and Erlang are not slow by any metric, but there are faster languages. Discord famously had to augment their Elixir code with Rust for example to scale to 11 million users [1].

[1] https://discord.com/blog/using-rust-to-scale-elixir-for-11-m...


that is much better point, but erlang gives you reliable system by default and you have to try really hard to make it crash, whereas go is just regular compiled language and such programs typically crash more easily/often.


Erlang is definitely more fault tolerant than most languages, but I've found that static typing tends to catch a lot of errors in development that would otherwise crash an application in production. The compiler won't catch every bug, and you'll still typically have to restart a crashed service periodically (via systemd, or a container orchestrator, or whatever process manager you use), but it definitely helps.

Gleam is a pretty good choice if you need type safety and you want to run on the BEAM https://gleam.run/. It still has the same performance characteristics as Erlang (which it compiles to), but at least it gives you type safety.


https://stackoverflow.com/a/73932230/312907 I found this answer.

I can't find the original announcement, but I found the accepted Go proposal: https://github.com/golang/go/issues/24543

I remember that being the straw that made me drop Elixir (which I love, to be clear). Go excels in many, many places where Elixir does, but it's way faster.

I do think Elixir has immutability on their side, which is huge for new developers, but there are way less developers in Elixir than Go, so the end result doesn't change unfortunately.


Thanks for sharing! That's really exciting, going to have to make sure we're on the latest Go version everywhere.


Interesting, at least when I was using it the goroutines themselves were cooperative, and of course the OS threads were preemptive when GO_MAXPROCS > 1. Not finding much with a search, there was a proposal to make them preemptive. Curious if others will chime in that it's now preemptive. Even so I like Elixir better as a language, but the processing speed of Go with a preemptive scheduler would be tempting for some use cases.


Does Erlang prevent a thread from consuming lots of memory and starving other threads? Managing CPU does not seem to be particularly hard. Managing RAM with GC and objects moving between threads is interesting.


> just like the concurrent code of gen_server

I found this a really interesting read, but this stuck out because it doesn't jive with my mental model of gen_server.

gen_server is fully serialized. Even the code underpinning it is not concurrent.

Now I guess gen_server does expose some top level functions to simplify sending/receiving a message into the process, but the process itself is serial.

And this is part of the genius of gen_server to me. You don't need to think about your state concurrently because it processes a single message at a time. Your system executes concurrently, but the individual components do not.

Maybe that is what the post means and I misinterpreted it.


Yup, that's my understanding too -- what Erlang does extremely well is abstract away and isolate the concurrency mechanisms, allowing you to think of the details of (these concurrent) processes, linearly.


I remember when I was learning Elixir—I asked my co-worker / Elixir mentor how Elixir solves concurrent programming, data access, etc. His cheeky but accurate response was "How do you solve a problem that you don't have?"

His point was that the paradigm is so different at the VM level, that many things become irrelevant to the conversation. That said, there still are concurrent programming challenges on the BEAM, but it's very minimal compared to languages where it's not baked in.

Ruby Ractor is a good example of how a VM-backed concurrency mechanism will likely change how programs in that language can be built.


Yeah they mention that in 2 of the 6 points arguing in favor of behaviors:

2. The application programmer writes sequential code, all concurrency is hidden away in the behaviour;

4. Easier for new team members to get started: business logic is sequential, similar structure that they might have seen before elsewhere;


Adding on to sb8244's comments; writing sequential code is very nice, but it's not the gen_server behavior that gets you that. It's the lightweight processes.

gen_server is useful, but it's not much more than receive in a tail recursive loop and conventions around messages, most specifically about if the sender expects to receive a reply or not. It's not magic, and it's written in clear Erlang that anyone with a week of Erlang fiddling can understand (which kind of is magic; almost everything in OTP is clearly readable)

Concurrency comes from running many processes each with their own message queue.


Thanks for that. My thought is probably just semantics then.

I think it's even more nuanced, but doesn't really matter for this type of post.

(I am actively writing a chapter exploring this topic, so is has been top of mind.)


gen_server is a serializer, it is the point of it to begin with. Could be called gen_serializer ;) Because we need to update the state inside of it without locks.

It is usually fine, for parallelism just use a pool of gen_servers or many gen_server processes that would map to your data model well (example, one gen_server for each network socket).

Though, these properties of gen_server come from the fact it is just a single erlang process.


> just use a pool of gen_servers or many gen_server processes

I think architecting an Erlang system can be kind of tricky because you do have to think about all these processes sort of co-existing at the same time and how they interact. With one system I was involved with, we weren't really satisfied with how we'd divided things up among different behaviors, and did some refactoring. It wasn't difficult, but it did require a different way of thinking about things. When we had it all working, I was really satisfied with the results though. That thing was robust, and quite resilient.

I miss working with Erlang. It's tricky to find people who are using it or Elixir in the sweet spot, IMO. Lots of "it looked cool and we wanted to play around with it" out there, as well as some people thinking it'll magically make their systems "internet scale".

My best experiences with it have been semi-embedded systems where it's not doing too much distributed computing, but where "robust" and "predictable" are important qualities.


> I think architecting an Erlang system can be kind of tricky because you do have to think about all these processes sort of co-existing at the same time and how they interact.

I find it easier to think about each process in isolation --- what messages does it get, and what does it do with them; if it needs to send messages, who/where does it send them to and what will it get back, including errors or timeouts and not worrying about what the other process does in the moment.

The overall behavior of a system built from communicating processes can get hard to predict though. You build up observations, intuition, and escape hatches over time.



Yes, but... Merriam-Webster also acknowledges that the use of "jive" in this sense "seems to have increased each decade since the 1940s, and may often be found in reputable and edited sources today".

https://www.merriam-webster.com/words-at-play/jive-jibe-gibe

"... it seems possible that this use of jive will increase in the future, and if it does dictionaries will likely add it to the definition."

Language evolves.


The main insight is in Joe Armstrong's thesis. "Making reliable distributed systems in the presence of software errors" https://erlang.org/download/armstrong_thesis_2003.pdf (I think the original title was "In the presence of hardware errors", emphasis mine)

All the other things flow from that thesis and understanding. You can recreate behaviours described in the repo doc using Erlang primitives very easily, and they are very hard to recreate in pretty much any other language.

Because Erlang is very much literally about lightweight processes and message passing. Only:

- every part of the system knows about lightweight processes and messages

- every part of the system is engineered around them

- the underlying VM is not only re-entrant in almost every single function, it's also extremely resilient, and can almost guarantee that when something dies, only that process dies, and the rest of the system receives a guaranteed notification that this process dies

There are more, but without at least that you can't really recreate Erlang's standard behaviours. For example, you can't recreate this in Go or Rust because `panic` will kill your entire program unconditionally.


18 or so years ago, I was a phone monkey for a government department who was initially learning to code for fun, but quickly realised programming could be a great way to earn a living.

I read about Erlang on /r/programming, which was having a new fad for a new shiny language, as was the custom back then.

And I desperately followed all the /r/programming fads because I was worried that I'd end up irrelevant if I wasn't skilled up in the latest Haskell web framework.

But one of those fads was Erlang, and it intrigued me so much, that I ended up printing Mr Armstrong's thesis, stuck it in a manila folder, and read it, slowly, cover to cover while waiting for, or riding on, my bus to my contact centre job, and I've still got it in my bookcase.

His thinking on resiliency in the face of inevitable failures, and on safe concurrency, has shaped my thinking, and proved invaluable repeatedly and is very relevant today. It's like their phone switches were distributed microservices running in a container orchestrator, before it was cool.


But one of those fads was Erlang, and it intrigued me so much, that I ended up printing Mr Armstrong's thesis, stuck it in a manila folder, and read it

I am now inspired to do the same, thank you!


Most modern async runtimes let you do that.

A (worker) thread dying isn't an issue in Rust, Go, C# and etc. Sure, each goes about error handling in a slightly different way either opt-in or opt-out or enforced (when unwrapping Error<T, E>) but other than that the advantages of Erlang/Elixir have faded over time because the industry has caught up.

p.s.: C# has not one but two re-entrancy syntax options - 'async/await' and 'IEnumerable<T>/yield return'. You can use latter to conveniently implement state machines.


> but other than that the advantages of Erlang/Elixir have faded over time because the industry has caught up.

It really hasn't. People fixate on the idea of just running some processes, and just catching some errors.

And yet, non of the languages that "solved this" can give you Erlang's supervision trees that are built on Erlang and Erlang VM's basic functionality. Well, Akka tried, and re-implemented half of Erlang in the process :)

But other advantages did fade: multi-machine configurations are solved by kubernetes. And it no longer matters that you can orchestrate multiple processes doing something when even CI/CD now looks like "download a hundred docker containers for even the smallest off tasks and execute those".

> p.s.: C# has not one but two re-entrancy syntax options - 'async/await' and 'IEnumerable<T>/yield return'.

What I meant by re-entrancy in the VM is this:

Every process in Erlang gets a certain number of reductions. Where a reduction is a function call, or a message passed. Every time a function is called or a message is passed, the reduction count for that process is reduced by one. Once it reaches zero, the process is paused, and a different process running on the same scheduler is executed. Once that reaches zero, the next one is executed etc.

Once all processes are executed the reduction counter is reset, the process is woken up and resumed.

On top of that, if a process waits for a message, it can be paused indefinitely long, and resumed only when the message it waits for arrives.

So, all functions that this process executes have to be re-entrant. And it doesn't mean just the functions written in Erlang itself. It means all functions, including the ones in the VM: I/O (disk, network), error handling, date handling, regexps... You name it.


Sometimes the advantage is not just the ability to do something -- I mean, by that logic, only bare-metal systems languages like Rust, C, Zig etc could have advantages over other languages.

Erlang (and by extension Elixir) has the advantage that the programmer can think at a higher level about their system. You don't have to write or configure a scheduler. You don't have to invent supervision trees. You can be sure that the concurrently-running parts of your system cannot possibly affect each other's memory footprint (though Rust gives a robust answer to this problem as well).

It doesn't make a perfect fit for every problem, but there is still a decent-sized space of problems -- I'd say "highly concurrent, but not highly parallel" -- where Erlang gives the programmer a headstart.


The thing is, we mostly try to keep applications stateless, unless it's necessary to keep state for performance (think realtime onlinegames, hft, ...).

And in those cases there is simply no need for e.g. supervision trees because there is nothing to restart. You still need stuff like retrying, but this is supported by all major concurrency libraries / effect systems. In fact, Erlang has fallen behind here in terms of what the language offers (not what the BEAM offers, the BEAM is still top notch imho)

State is mostly moved to either the database or message queues or similar, which is pretty good.


We try to keep applications stateless because state handling in most programming languages is pathological. Erlang presents a different solution to the underlying problem of state management, rather than the trying to solve the higher-level of "how best to support this one particular solution to the problem of state management".

Namely: The message queues are part of the language. They're built right in, for ease of use. Your caching layer is built into the language, for ease of use and faster performance.


> We try to keep applications stateless because state handling in most programming languages is pathological.

It's true that handling state is hard in most programming languages. But that's neither the only nor the most important reason why people try to keep applications stateless.


C#'s weakness here is that those two patterns are cooperative multitasking only. Under the hood they retain control of a thread until they yield execution. By default resource management is something that needs to be considered and the default thread pool is not an uncontested resource.

I don't use Erlang but my understanding is that while it is not exactly fully pre-emptive, there are safeguards in place to ensure process fairness without developer foresight.


C# async runtime is mixed mode, threadpool will try to optimize the threadcount so that all tasks can advance fairly-ish. This means spawning more worker threads than physical cores and relying on operating system's thread pre-emption to shuffle them for work.

That's why synchronously blocking a thread is not a complete loss of throughput. It used to be worse but starting from .NET 6, threadpool was rewritten in C# and can actively detect blocked threads and inject more to deal with the issue.

Additionally, another commenter above mistakenly called Rust "bare metal" which it is not because for async it is usually paired with tokio or async-std which (by default, configurable) spawn 1:1 worker threads per CPU physical threads and actively manage those too.

p.s.: the goal of cooperative multi-tasking is precisely to alleviate the issues that come with pre-emptive one. I think Java's project Loom approach is a mistake and made sense 10 years ago but not today, with every modern language adopting async/await semantics.


Hey, I also prefer C# and async. Alternatives have yet to prove they can handle gui patterns where main threads matter.

...but the problems stated are real. I'm excited to hear that this might be fixed in .net 6 but it'll be a while before that rolls out to most deployments.


Apologies but it seems you have gotten wrong impression (or maybe I did a poor job in explaining).

It has never been a big issue in the first place because by now everyone knows not to 'Thread.Sleep(500)' or 'File.ReadAllBytes' in methods that can be executed by threadpool and use 'await Task.Delay(500)' or 'await File.ReadAllBytesAsync' instead. And even then you would run into threadpool starvation only under load and when quickly exhausting newly spawned threads. It is a relatively niche problem, not the main cornerstone of runtime design some make it out to be.

Also, .NET 6 is old news and has been released on Nov 8, 2021. It is the deployment target for many enterprise projects nowadays.


"Everyone knows to do it right" is no protection at all. And honestly, I would push back on this in general because no its not well known at all. A fresh grad will not intuitively know to look for WhateverAsync API in case they exist and veterans will miss this as well.

Knowing that file IO is too heavy and has *Async counterpart methods is somewhat obvious to a veteran, but other long running methods are not so obvious. In this case you would need to profile your use case to understand that certain calculations/methods might be best farmed off to a different threadpool.

Unity still uses Mono and has a very low max thread pool size, for example. The thread pool is easily starved in the latest version of that engine and I'm sure it's more common than you think.

Relatively niche, perhaps, but a critical problem when stumbled upon none the less. Again, I like async/await but there are certainly foot guns left to remove.


Unity is special and has its own API and popular patterns, if you block the main/render thread it will explode, regardless of the language of choice, and Erlang/Elixir performance is not acceptable for Gamedev and will likely stumble upon similar issues.

Again, and I cannot stress this enough, we're discussing somewhat niche feature. You have to take into account that even the standard library still has a lot of semi-blocking code, simply due to the nature of certain system calls or networking code. From runtime standpoint, blocking or computationally heavy logic - there is no difference, it will scale the amount of threads to account for fairness automatically. It's that blocking just has extra cost due to being "better" at holding threads (you don't have to think about it). .NET 6 is just comparatively better at dealing with such scenarios but your app would work fine in PROD 9 times out of 10 with invalid code before or after that. It's a difference between running 'Task.Run(() => /* use up thread for no reason for seconds / minutes */))' in a 100s iterations loop going from terrible to very bad.

It's pointless to "fight against words". Just trust the runtime to do its thing right. That's why its baseline cost is somewhat higher than that of Golang or Rust/Tokio - you pay more upfront to get foolproof solution that has really good multi-threaded scaling.

If you don't want to believe the above, just look at average C# solutions on Github. There are no "special magic to learn", that's just how people write code new to the language or otherwise.

p.s.: This situation reminds me one of my colleagues who would always come up with an excuse for his point regardless of context. It's counter-productive and self-defeating.


> A (worker) thread dying isn't an issue in Rust, Go, C# and etc

If your Rust thread panics while it holds a Mutex, you've got a bit of a mess. Especially if it was halfway through updating shared mutable state. Probably similar in Go or C#, but I haven't used Go and only did cargo cult programming in C#, I didn't read any sources or see warnings about crashing in threads or async/await.


> A (worker) thread dying isn't an issue in Rust, Go, C# and etc.

Go channels are nice but they don’t come close to Erlang message passing. In go you can’t just ignore if the channel is bounded or unbounded, open or closed. Writing to a closed channel with blow you up. It takes some time to learn it. Messages in Erl are easy fifo serial execution.


> For example, you can't recreate this in Go or Rust

If you're willing to make your "lightweight processes" OS threads you could kind of make it work. E.g. Rust gives you both panic hooks (to notify everyone else that you died) and catch_unwind to contain the effect of a panic (which generally stops at a thread boundary anyways). But of course that only scales to a couple hundred or thousand threads, so you probably have to sacrifice a lot of granularity.

And any library that links to C/C++ code has the potential to bring the whole process down (unless you make your "lightweight processes" just "OS processes", but that just makes the scaling problems worse)


Lightweight processes is explicitly not os threads in at least two senses: smaller footprint in memory, and no system call for every context switch.

It’s explained in many documents about lightweight processes, of course for elixir/erlang/beam but also for Go and Crystal and even going back to Solaris Internals and modern Project Loom for upcoming JVM situations


Erlang's beam processes are undoubtedly awesome. But if we start with the premise of "achieving the same goals without Erlang" I think it's entirely valid to start with "what does our process primitive look like". With just 16GB of RAM my laptop is quite memory constrained, but according to task manager I'm still running 5200 threads accross 350 processes right now. Many use cases that required light threads/processes 37 years ago or even 20 years ago would work with OS threads by now. Of course many others don't, which is where the Erlang popularity comes from.


It takes on the order of nanoseconds to start an Erlang process. They are also extremely lightweight memory-wise. And Erlang VM tries to keep context switching between CPU cores to a minumum. And all processes get more-or-less equal share, so it's hard to get a process consuming all of CPU and never yielding back to other processes.

So firing off and monitoring processes becomes second nature easily. Rarely so in other languages


OS threads would defeat the purpose though. You can run millions of BEAM processes. Threads are a lot more expensive and still run the risk of taking over your system with an infinite loop.


OS threads are actually premptive, so an infinite loop is really less of a deal than one in Erlang. Erlang is really not premptive, it's just that function calls are automatically (potential) yield points, and you can't loop without function calls because of the construction of the language; if somehow you did manage to get a real loop into a process, that scheduler would be stuck forever, and you'd end up with a broken BEAM when you triggered something that requires coordination between schedulers --- I forget what sorts of things do that, but code purging can't finish without each scheduler doing housekeeping, etc.

Meanwhile, your OS thread will just keep eating CPU, but everything else is fine.

You're right about the number of threads though. It takes a lot more memory to run an OS thread than a BEAM process, and I'm not sure OS schedulers will manage that many threads even if the memory is there (but I could be wrong... getting things to work nicely at 50k threads may be sufficient for millions)


Unless code is exiting the BEAM itself, an infinite loop that bypasses yield points shouldn't be possible.

Doing this at such a granular level is also one of the reasons that you can run a database that thousands of processes are accessing, within the same runtime, without a performance impact to the overall system.


> For example, you can't recreate this in Go or Rust because `panic` will kill your entire program unconditionally.

You can recover() panic() just fine in Go.


Yeah, OP took a shortcut here :)

The nuance is that most languages let you build reliable programs _if your code is correct_ - if you're using defers, context handlers, finalizer, cleaning up state in shared data structure, etc.

Erlang's goal is to be reliable "in the presence of software errors", that is, even if your code is buggy. If a request handler process dies inside an Erlang web app, whatever file or socket it opened, memory it allocated, shared resource it acquired (e.g. a DB connection) will be reclaimed. This is true without having to write any error handling in your code.

The way it's done is that the VM handles the basic stuff like memory and files, and it provides signals that libraries (like a DB connection pool for instance) can use to know if a process fails and clean up as needed. In other words the process that fails is not responsible for its own clean up.

At some point some code must of course be correct for this to work. Like, if the DB connection pool library doesn't monitor processes that borrow connections, it could leak the connection when such a process dies. But the point is that this part (the "error kernel") can be a small, well-tested part of the overall system; whereas in a classic program, the entire codebase has to handle errors correctly to guarantee overall stability.


These are like sledgehammers when you need a screwdriver.

It's very hard to build Erlang-like versions of supervision trees using those tools.


> These are like sledgehammers when you need a screwdriver.

Also by the very nature of Go your application state is as likely as not to be fucked, so even if you did handroll monitors and links, and thus could build supervision trees, they wouldn’t be of much use.


> You can recover() panic() just fine in Go.

You can catch_unwind() panic!() in Rust too :-).


I agree up to the last point. You can catch panics at the thread level right? I mean, more generally, what is Erlang implemented in?


> I mean, more generally, what is Erlang implemented in?

"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang." - Virding's Law ;)

So first you have to implement all those things. And then use them.


You can in C/C++. Though, the point is that erlang has a strategy how to clean and recover from such error (some data can be lost, rarely we can crash the whole system for some errors).

I mean, you can write erlang-way in almost any language, just in this case you need to adopt all the libraries to follow the same principles.

Some languages implement similar error handling strategy by just creating a separate process per request (hello, php). We know how to clean after a worker dies (just let the worker die). Just in that case supervisor strategy is very simple.


Of course you could implement a language like Erlang in Rust, but I think the point is that you would have to do exactly that in order to do in Rust what Erlang does at the language level.


Erlang is implemented on top of BEAM which has its own scheduler and schedules its own idea of lightweight processes.


It looks like features that the Go runtime provides.


In C. You don’t operate on thread level directly, threads are just executors of processes managed by VM, usually the same amount as vCPU cores.


>> I mean, more generally, what is Erlang implemented in?

Pure magic, that can't be recreated in any regular language :)


As someone who worked with Erlang for a few of years I still wonder if using parallel processing in program top level is worth it or not. Because there are many ways to process data efficiently using queues, pipelines, etc and being clear on when it happens rather than a "wild west" of Erlang where you need to manage processes, links, restarts, and it's harder to focus on the business logic.


What do you mean with "process data efficiently"? You speak about performance? But the main goal of the design of Erlang is fault tolerance, not performance. You can still use queues or whatever in Erlang if you want, but having a fault-tolerant system with process tree and process supervision in other languages is a different story.


Right, I was mainly describing arguments for parallel processing which Erlang is also famous for.

In terms of being fault-tolerant I think the modern approach with (micro)services is quite similar, one can have multiple services running and communicating using something like protobuf, having restart strategies, fallbacks and so on. From my experience Erlang doesn't offer any killer features in this case, does it?


Erlang pushes kubernetes down into the language and its libraries, instead of requiring Docker images and containers, etc.

This means that the processes can be much, much more lightweight than kubernetes pods and containers. Cheaper to kill and restart. Can provide concurrency and fault tolerance at a far more granular level. Much simpler to write and deploy than Docker images. Etc. etc.

At least that's my understanding (significant work experience with kubernetes, no real production experience with Erlang/Elixir).


As an dev working in Elixir, you're right on the money. IMO, the biggest benefit you gain from that granularity is that since processes cost next to nothing to spin up and down, you can (ab)use your fault tolerance mechanism for error handling, leading to the famous "Let it crash" philosophy.

Restarting your docker image every time someone sends a malformed packet to your webserver is going to make for a trivial not even D-DOS attack. Killing the individual process that spins up to handle that particular packet/connection, though, is simply best practice.


Except you still need to run and deploy your Erlang app somewhere which might end up on Kubernetes.

Kubernetes is just better than Erlang for process managment because it does more and is completely language agnostic, imagine you can't build some part of your system in Erlang but you need the same kind of functionality what do you do?


Depending on the size of your component that can't be built in Erlang, you may be able to use NIF and throw it on the dirty scheduler.


You can make NIF, you can implement Erlang protocol to talk to the node (and I have done it). Both of those are possible but does it worth it in the end? Is it worth to have orchestration builtin into the language and locked with some particular features instead of a full-fledged devops setup? The only argument I have is in early stages of development it might save some time.


The killer feature is that the fault-tolerant behaviour is something either implicit, builtin or bundled [Erlang/OTP] with the language, instead of something that you need to either reinvent or bolt yourself to both your own code and your dependencies code as a afterthought when things start to fail.


In Erlang the fault-tolerant behavior is not builtin either, only tools to make it. You still have to make the right supervision tree, dependencies between processes, links, making sure you handle the process termination messages correctly and many other details.

In my experience in non-Erlang setups while doing requests to other services you have to check the response status and add some code handling it, so it's not really a complete afterthought. The only difference I see here is that Erlang handles failure in a real-time way, but it also can be done using some periodic task to query important services. And implementing in outside of Erlang gives more flexibility (think of Erlang cluster size and network limitations)


Our team had worked on an Elixir app for a couple years, before splitting off game logic into Dotnet. Scaling the dotnet server was a much different beast:

- It wasn't designed to crash on failure. It uses thread pools with no supervision trees. We had to add in liveliness probes to check if it is alive. I've only had to use readiness checks for Elixir

- No REPL. With a REPL in production, we can debug things live, even try patches to see if those work. Can't do that with Dotnet. That's also something that contributes to reliability

Now, cluster size do matter. The way Erlang and BEAM was designed were for vertical scaling. You can minimize cluster size by biasing towards vertical scaling. That's what we do on our systems. There's a way to do that with Kubernetes so that we scale vertically during our daily traffic cycle.

At some point though, you start looking at partial clustering topology for BEAM, or use one of the many process registeries that are better suited for dynamic membership. (The one bitwalker wrote comes to mind).


That's wrong. Fault tolerance is basically the default. Yes you have to build a supervision tree, but unless you're writing a one-off script, you have to build it to anyways to do anything.


Well you confirming what I said - you have to build it, it's not that Erlang programs automatically never fail and always handle problem correctly as required.


>From my experience Erlang doesn't offer any killer features in this case, does it?

Yes the killer feature in this case (relative to microservices) is a sane deployment, single codebase, unified language, being able to do integration tests without a full DevOps team, etc.


I agree that Erlang has some benefits in the early stage of development or with smaller projects, but beyond that you anyway will need devops and this argument no longer holds. One example I have is whatsapp being started in Erlang by a few engineers and it was a huge success from engineering point of view, but later rewritten (I think in C++) anyways. They had their reasons but it's clear that something in Erlang was not satisfactory and fault-tolerance with distributed nature was not enough to stop it.


Do I understand you correctly in that you'd like more structure? E.g. that you can only deploy an `application` (= supervisor tree)?


I'm talking about development, deployment should be automated anyways.

As a developer I think it's easier to think in terms of how to send data to hadoop, sqs queue etc for processing and read results later than keep in mind the supervision tree, messages, mailbox size, linking and so on. And the "processing" side can be as well implemented in Erlang just I don't feel Erlang's features are needed in the "top level development" and create more problems and barriers.


But isn't the alternative wrangling infrastructure with yaml files?


It is yes. When you just starting a project Erlang's approach might be favorable but once the whole infrastructure is there the difference is negligible.


Not important to the conclusion or reasoning... but Stevena's post says:

  "the whole team working on Erlang quit and started their own company."
The same event as described in "A history of Erlang":

  "In December, most of the group that created Erlang resigned
  from Ericsson and started a new company called Bluetail AB."
  https://www.labouseur.com/courses/erlang/history-of-erlang-armstrong.pdf
'most of the group that _created_ Erlang' is not the same thing as 'the whole team _working_ on Erlang'. Or, quantifying it, at the end of the 'history' paper, there's a list of 45 people under 'implementation' and 'tools'. Around 35 were at Ericsson in 1998. Of those, nine or ten quit to form Bluetail, and another two or three left for Bluetail later on.

(In 1998, there were two connected groups working on Erlang in the same building in Älvsjö, Stockholm. One was the computer science laboratory (CSLAB), where Erlang was created, the other was "Open Systems", which had more of a development role. A significant part of CSLAB left. Almost all of 'Open Systems' stayed. Many that stayed were already doing a stellar job on Erlang and many still are.)


Fixed, thanks!


What i learn from Erlang, is, naming is hard.

What does `gen` mean in `gen_server` ?

Thanks for replies for "Generic". So this article is not well organized in that case. With some simple naming explanation, it should be obvious to guess what the system does.

Abbreviation doesn't save the writings.

Some more examples, in Ruby, instead of calling `implement Module`, it uses `include Module` . I do think include is not as clear as implement.


I always thought it was “generate” and that something about the implementation/immutableness required the server to be “generated” at runtime. I never did any actual Erlang programming, it’s just how my brain plugged in the gaps when I heard about all the gen stuff.


I also dislike this name, as it was not clear what it does just by looking at the name.

Recently, I was adding a similar abstraction to Rust on a project I'm working on and I called it `AbstractProcess`. Like, it is some kinda of template for a process. Still don't think it's clear what it does by just looking at the name. Does anyone have a better idea on how to name such a pattern?


The suffix `Factory` often gets made fun of, but there are a few places where it makes sense. Sounds like this is one of them.


Factory went out of favor. Nowadays people just add an s as a suffix in Java.


Could also be called "ProcessTemplate" ?


> Some more examples, in Ruby, instead of calling `implement Module`, it uses `include Module` . I do think include is not as clear as implement.

Agreed, though I do think the `include` naming might just be a relic from a time before `prepend` and `extend` also existed in Ruby. I find it a lot more intuitive when I remember it as `append`ing to the ancestors, and the language even sort of calls it that internally with the naming mismatch between `include` and `append_features`: https://ruby-doc.org/3.2.0/Module.html#method-i-include

Then one can see that's exactly what it does to a Module's ancestor chain:

    irb(main):001:0> RUBY_VERSION         => "3.2.0"
    irb(main):002:0> lol = ::Module::new  => #<Module:0x00007f2969821740>
    irb(main):003:0> lol.ancestors        => [#<Module:0x00007f2969821740>]
    irb(main):004:0> lol.singleton_class.ancestors => [#<Class:#<Module:0x00007f2969821740>>, Module, Object, PP::ObjectMixin, Kernel, BasicObject]
    irb(main):005:0> rofl = ::Module::new => #<Module:0x00007f296982dfe0>
    irb(main):006:0> lmao = ::Module::new => #<Module:0x00007f296982fd40>
    irb(main):007:0> omg  = ::Module::new => #<Module:0x00007f2969822a00>
    irb(main):008:0> lol.include(rofl)    => #<Module:0x00007f2969821740>
    irb(main):009:0> lol.prepend(lmao)    => #<Module:0x00007f2969821740>
    irb(main):010:0> lol.extend(omg)      => #<Module:0x00007f2969821740>
    irb(main):011:0> lol.ancestors        => [#<Module:0x00007f296982fd40>, #<Module:0x00007f2969821740>, #<Module:0x00007f296982dfe0>]
    irb(main):012:0> lol.singleton_class.ancestors => [#<Class:#<Module:0x00007f2969821740>>, #<Module:0x00007f2969822a00>, Module, Object, PP::ObjectMixin, Kernel, BasicObject]


> Some more examples, in Ruby, instead of calling `implement Module`, it uses `include Module`

While (as a sibling comment notes) “append” might be better than “include” given other terms used in Ruby, “implement” would be completely wrong. Modules aren’t interfaces; in fact, they are almost exactly the opposite. A class in a language with interfaces declares that it “implements” an interface because the interface provides guarantees, and the class provides an implementation of those guarantees. An included Ruby module provides implementation, not guarantees that the class provides implementations for.

“include” describes what it does much better than “implement”.


generic, I believe. i.e. it’s an interface that has to be implemented.


Yep, “generic” it is.


If they called it 'generic_server' then people would make fun of it for being too verbose. If they call it 'gen_server' then everyone pretends they know what it means and stays quiet.


gen_server to me is “generate a server”


I associate “gen” more with “generation”, in the sense of “gen Z”. In any case, it’s confusing.

With code completion you shouldn’t have to resort to abbreviations (unless they are already part of the (business) domain language). This was different in the 1980s though.


generic?


> The application programmer writes sequential code, all concurrency is hidden away in the behaviour;

Yeah, until the business problem itself involves inherent concurrency, which usually happens much faster than people think. Or until I, as the non-expert, want to dig in to make changes or debug a problem.

This distinction into "expert" and "lowlife (SCNR) using the expert abstractions" is really one that doesn't hold in practice most of the time.

I think it's much better to embrace that concurrency is a cross-cutting-concern and reality is that it can happen on any level, so the language should better support reality.


I think you're talking about two different things. When they say concurrency is hidden away, I think they mean the programmer isn't dealing directly with locks/mutexes. They are also lying, since async functionality like `handle_cast` is built into the `gen_server` behaviour and every caller is calling either `gen_server:cast` or `gen_server:call` depending on whether or not it is going to block on a response. Whenever you invoke async behaviour concurrency will invade all the domain logic and can't be ignored.


Very interesting read. Also very timely for me (I am always amazed how HN often has posts that resonate strongly with what I am currently doing), as I am just now designing a programming language based on a generalisation of Algebra, which I call Abstraction Algebra. I think there is a strong connection to this post: Behaviours are just an abstraction algebra you program against. Switch out the algebra implementation against another one, and you can adapt the same program to different scenarios without changing the program itself.


I always thought it would be neat to have a system that takes math symbols / formulas / functions etc in the notation of a mathematician, and automatically generates highly performant implementations — sort of like Taichi but instead of using python, you just use mathematics.

Seems to me all the tools are available to accomplish this effectively now days.

Good luck on your project, in any case.


Thank you!

Yes, a large part of programming will be just a special case of doing mathematics.


> I am just now designing a programming language based on a generalisation of Algebra, which I call Abstraction Algebra.

I admire your ambition!


To be honest, I am quite amazed by how nicely all the pieces are coming together now. It all just feels right, it feels like collecting fruit from under a huge apple tree.


I am also designing a language based on algebra.

https://GitHub.com/samsquire/algebralang

It's based on the idea there are relations between variables and every function is a concurrent process.


I see you are starting from what you want it to look like. That's a good idea! Did the same for Practal [1]. You might be interested in trying to express your language within Practal. Within the next 1 to 2 weeks something you can start playing with should be available. You will then be able to define your own syntax freely (as Practal contains an LR parser, hidden behind an easy-to-use syntax extension mechanism), together with its semantics. By the end of February functionality like execution via rewriting should be available. By the end of March modules are planned, and I think this is when it will start being real fun to use Practal. You could then define your language just as a module/theory in Practal. That's what I am going to do with the language I am working on anyways!

[1] https://practal.com


> In 1998 Ericsson decided to ban all use of Erlang.

Does anyone know why?


From Joe Armstrong's thesis (p. 6):

> In February 1998 Erlang was banned for new product development within Ericsson—the main reason for the ban was that Ericsson wanted to be a consumer of sodware technologies rather than a producer.

From Bjarne Däcker's thesis (2000, p. 37):

> In February 1998, Erlang was banned within Ericsson Radio AB (ERA) for new product projects aimed for external customers because: > > “The selection of an implementation language implies a more long-term commitment than selection of processors and OS, due to the longer life cycle of implemented products. Use of a proprietary language, implies a continued effort to maintain and further develop the support and the development environment. It further implies that we cannot easily benefit from, and find synergy with, the evolution following the large scale deployment of globally used languages.” [Ri98]


Also, from Wikipedia:

"In February 1998, Ericsson Radio Systems banned the in-house use of Erlang for new products, citing a preference for non-proprietary languages. The ban caused Armstrong and others to make plans to leave Ericsson. In March 1998 Ericsson announced the AXD301 switch, containing over a million lines of Erlang and reported to achieve a high availability of nine "9"s. In December 1998, the implementation of Erlang was open-sourced and most of the Erlang team resigned to form a new company Bluetail AB. Ericsson eventually relaxed the ban and re-hired Armstrong in 2004."

Not wanting to rely on a fairly esoteric in-house language makes some sense.

Since then things have changed significantly of course.


> Not wanting to rely on a fairly esoteric in-house language makes some sense.

Not necessarily… thst language clearly was a competitive advantage


Reminds of Paul Graham’s essay on Lisp being a secret weapon for Viaweb, Beating The Averages[1].

[1]: http://www.paulgraham.com/avg.html


There have been second hand report of the current leadership at Ericsson saying that opensourcing erlang was the worst decision they had taken. As it would have given way a massive competitive advantage.


They are idiots then. Erland would simply not be as good if it was not open. companies not understanding open source is just so annoying.


Possibly but...

Nearly 80% of OTP development is and has always been, done internally at Ericsson. And they barely use the libraries from the outside world. From inside Ericsson, erlang look a lot like a proprietary language


There's pros and cons to these kind of things. "Best tool for the job" reasoned purely from a technical point of view isn't necessarily the "best tool for the job" when everything is factored in.

It's hard for me to judge one way or the other; I wasn't at Ericsson in 1998, or indeed, ever at Ericsson. I just figured that the language wasn't open source at the time and that they came back on their decision just a few year later were important bits missing from the previous comment.


What I recall from Erlang writings and videos is that, basically, Ericsson's C++ developers didn't want their cheese moved and successfully deployed the obvious (and not obviously unreasonable) arguments about how C++ was the industry standard, as above. I think Armstrong also admitted that the Erlang crew had a bit of a cocky attitude and wasn't great at winning others over.


This reminds me a lot of Ron Garret's story of the difficulties of advocating for the use of Lisp against the industry standard C for space projects. Sounds like a lot of the exact same dynamics.

https://www.corecursive.com/lisp-in-space-with-ron-garret/ https://news.ycombinator.com/item?id=34524552


> the Erlang crew had a bit of a cocky attitude

This is also pretty prevalent in the Erlang users of today.

I count myself among those, but I am hopefully not as cocky with it as I once was.


If you're cocky and you have the results to back it up then I'm ok with that.


Morally? Yes. Strategically? Not optimal. :/


I think the decision makes sense, there are things where paying someone else for the effort can save you on cost and on a maintenance nightmare and you can have your engineers focused on things that matter that already exist and building things that maybe no third party vendor gets it quite right, and that's okay to build in-house.


It wasn't an obviously crazy argument, especially at the time. The problem is that, in the best case, Greenspun's Tenth Law catches you. Putting things in-band doesn't make them go away.


Sidenote: Isn't there also a law / rule that says every concurrent system eventually ends up reinventing Erlang?


Virding's First Rule of Programming:

> Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.

http://rvirding.blogspot.com/2008/01/virdings-first-rule-of-...


> sodware

This made me giggle.


When it comes to buy vs build, the grass _is_ always greener.


Then what you buy turns out to be not nearly as good as you thought, meanwhile lots of motivated open source developers are turning your homegrown build into something really game changing.

Sod's Law.


People really did love "synergy" in the 90s.


I work on a project that uses a proprietary language. Its very good but its miserable to be the only place where its used. Tools are sloppy and old, no external documentation or videos, can't hire anyone who knows it, and the people on the team are half committed because they have to be but also want to keep with industry trends for their next job.


There might be a case for open-sourcing it.


Management decided it was easier to hire c++ programmers.


Yes, they built a platform Cello [0] that would allow for that if I remember correctly

[0] https://web.archive.org/web/20170829230730/https://www.erics...


Easier to find C/C++ devs according to one friend who used to work there in early 2000s.


That is a cop out. You always need to train your people. I've been doing C++ for 20 years, and consider myself somewhat and expert, but if I join your team I will still need to learn a lot of things that are just how your team does it. If I also need to learn Erlang/java/go/ (pick any language I haven't used much) that only adds a short time.

Even as complex as C++ is, I can train a great programmer C++ a lot faster than I can train a future great junior C++ programmer to be a great programmer. Yes you will encounter the rough edges of whatever language often in the first 5 years, but a great programmer will be great in any language quickly. You need one expert in the language on the team for the weird complex stuff, but most code isn't that complex.


I mean… sort of. It’d be like trying to write an ML stack in lisp circa 2023. Sure, you can find devs willing to learn it, but it’ll be at least two orders or magnitude easier to find python devs.

I love lisp, but the thought of being forced to use it exclusively for ML makes me uneasy. And I’ve tried building lisp ML stacks. :)


I’d probably use Hy and JAX, :)

Should be beautiful.


Beautiful perhaps, but not productive for ML engineers who are joining the team and familiar with the standard stack.

Sure, they can be retrained, but I feel that there’s a real cost to this, and it’s too tempting to handwave it away as “they should just learn.”


C++ was all the rage in late 90s / early 00s.


Java was all the rage then. C with classes was dominate (not to be confused with C++), but java was the rage. Just like today Rust is all the rage, but it isn't as popular as C++ in the real world.


I remember that. What puzzled me that it was the time when software written in Java ran up to 100 time slower than native processes. Now it is not nearly as bad.

I had very interesting reaction from CTO of one Telecom when I showed them a prototype running on PC handling the same amount of transactions without breaking sweat as their Java backend equivalent that ran on beefy HP servers.


Java is a language designed so that IBM can shovel interchangeable programmers into a furnace and produce something functional.

Once you start with that goal, all the compromises and quirks make more sense.


See, it's funny because as a consumer of software, I've overall been pretty pleased with solutions that come out of that process (think Jenkins, Artifactory, Elastic, Bazel).

So it does seem capable of leading to reasonable results on the eventual timescale, despite various inconveniences and tradeoffs along the way. Of course, lots of terrible software has certainly been written in Java too, so it's not fair to compare Java's cream of the crop with some random PyPI package and draw conclusions.


I think Java got a lot better when it stopped trying to bring along a OS GUI toolkit.

Hence the vastly different experience between Java-exposed-as-app and Java-exposed-as-webapp.


Do you mean big red and not big blue with that statement?

Also, this reads as unnecessary snide. Having a language that makes it easy to onboard new people is a major strength. A lot of Golangs success is attributed to that strength of the language.


IMHO, modern IBM (inasmuch as it exists) is the textbook target for Java, from a professional services standpoint.

And it's snide and isn't.

There's a lot to be said of being able to shovel programmers into a furnace and produce something functional. Decreasing risk to timeline and of failure makes large project PMs very happy. And generally makes stakeholders happy, because things don't outright fail as often.


> Do you mean big red and not big blue with that statement?

I assume you're referring to Oracle here, but Oracle didn't invent Java either. Java came out of Sun Microsystems, whose color scheme seems to shift between blue and purple, so "big violet?" Just doesn't have the same ring to it.


Java is also a language with extensive library support, and some of the most sophisticated tooling and virtual machines with state virtual machine and garbage collection implementations of any language. The most recent versions of the language are much more programmer friendly than previous versions, greatly cutting the expressibility gap with other languages.

Just stay away from Spring, and you can have a nice development experience.


The topic here isn't modern Java, it is last 90s early 2000s java. Spring was either not invented yet, but the thought process behind it was already in place, or the new hotness everyone needed to know.


Sun was the one creating and designing Java....


Pretty sure Sun wasn't working for IBM when they designed Java, were they?


Now, but IBM went all in on it which was an important boost.


Microsoft was big into C++ then.

C++ was terrible at cross-plaform (incompatible support for language futures across compiler), which was a deal breaker for many orgs.


Fascinating! Long ago I came to the conclusion that multithreaded applications in the sense of shared memory protected by locks was simply too difficult for humans to reliably work with. As a result, it turns out I reinvented erlang behaviours in C++ (along with light processes and message passing in the form of Queues and Threads). I may have been influenced by erlang at the time. All of the behaviours listed were replicated. Convergent design I suppose.

So, you can probably create erlang behaviours in any language, but it requires building a framework and then training every developer in how to use it. There is probably value in having a standard version of erlang behaviours for a range of different languages.


I predict the next several generations of programmers will spend a lot of time and energy to reimplement most of Erlang/OTP in curly brace languages in a variation of Greenspun's 10th rule.


That's already the case. Why stop now?



Is there a rust equivalent?


> Please don't share this repo or any of the linked to documents or repos just yet!

Oh no :P

Very interesting read though, thanks for sharing!


When I started my first job at a grad scheme some 15 years ago, on the first day, we were asked to introduce ourselves to the group and say what name we'd prefer to be called by. This one guy says: "My name is XXX XXXXXXX. I don't care what you call me, just don't call me Ziggy... I HATE being called Ziggy".

I heard Ziggy still works there.


This reminds of lord of the flies: I'm not a native English speaker, and only read it recently. When Piggy says he doesn't want to be called Piggy, I figured that's what he would be called the whole book. Alas, he was, and I cannot remember his real name.


I'm guessing that guys real name was "Zbigniew"


Wonder if he did it intentionally or did not realize when he opened his mouth and said that.


> I heard Ziggy still works there.

He can’t have hated it that much, then. ;)


This is not the first time I'm reading about Ziggy!


hopefully stevan is same as stevana from github


IMHO the differentiator is deeper yet everywhere engrained in OTP/Behaviours and the famous "Let it crash" slogan. It is the OS-Style process and resource isolation. That is something you can't port with a library into a language that doesn't have it built-in. Lightweight processes, are not the same if they can crash each other, or messup some shared objects/resources/...

You can implement actor behaviours in go or node or even c, but without that lower level support it will never give you the stability guarantees that Erlang process isolation is giving.

To draw a weird comparison Elixir (with Erlang process isolation) brings two world together. First it's a PHP/Ruby level of fire-and-forget productivity because each http request is handled in an independent isolated process, which if it crashes won't affect the system, but instead provide automatically a nicely debuggable crash log. And second it provides natively all distributed tools for long-lived systems. E.g. PubSub , Sessions and database connections don't have to be rebuilt like in ruby/PHP on a per request basis but can be their own long-lived processes.

If there would be a library that could bring this easy to use process isolation+communication e.g. to C programming it would be a game changer. But the best you get in all other languages I'm aware of is to use actual process isolation (fork multiple node/ruby/go processes) and then use some kind of IPC manually or redis or k8s...


To me the real amazing thing about Erlang is more the fact that these patterns based on queues and messages and supervisors scale so well. They are easy to implement on a small, single node application. You can compose them. You don't need to complete reauthor your calls to start distributing them and fanning out responsibility across your nodes because you've been doing RPCs from the start. A multi-node application is not that much more difficult to reasonable than a single-node application


I get that this is probably written for people familiar with Erlang, but a quick suggestion if I may:

In the Behaviors section you talk about behaviors being similar to interfaces in Go and give a Go example. But then you switch examples for Erlang. Maybe show the Joe/Mike example written in Erlang and then say, here's a more complicated example (the key-value example) that really describes behaviors better.


There is a very common pattern in the world where people conflate goals with results. Of course, when I say that, it's obvious that the two things aren't the same, but the unexamined assumption that they are the same sneaks in anyhow when you aren't looking, a cognitive shortcut hard to catch yourself making and harder yet to get in front of and deal with.

In the light of this statement, the answer to what I think is the thesis question of that entire piece:

"This begs the question: why aren't language and library designers stealing the structure behind Erlang's behaviours, rather than copying the ideas of lightweight processes and message passing?"

Is that while Erlang has a lot of good goals, the results of how they got there are simply not the state of the art. Or, to put it another way, language designers are not copying Erlang, and they are correct to not copy Erlang.

I respect Erlang a lot. They were a good 10-15 years ahead of their time. However, you will note that if you add 10-15 years to the creation date of Erlang, you still end up in the past. If Erlang were to come out today, fresh, nobody had seen it before, into an otherwise identical programming language environment, I would say it makes several mistakes.

One I've written about before is that Erlang is a non-Algol language for no reason: https://news.ycombinator.com/item?id=7277957 (Note the library mentioned in that post, suture, is now mature, and I use it all the time. It works for what I need.) But in the context of this post, that's less critical.

The other major mistake I'd say it made if it came out in 2023 is that it is a totalizing environment. By that I mean that it has this built in implicit assumption that it is responsible for all the reliability in the system, and you don't get Erlang's features very conveniently if you don't use it as the complete control backplane for your entire system. You run an Erlang cluster, and it bundles all the message passing, restarting, reliability, cluster management, software deploy, and everything into one system.

But for the world we live in today, that's a price not worth paying. We don't need the Erlang message bus to be the only message bus. The Erlang message bus is, frankly, not very good, and it's actively terrible if you want to use it for one non-Erlang process to communicate to another. We don't need the Erlang message bus. We have a dozen message busses, some in the cloud, some commercial, some open source, some that double as retention (Kafka), all of which scale better, none of which tie you to Erlang's funky data types.

And then, within the matrix of those message busses, we don't need Erlang's restart capability. We have an abundance of ways to restart processes, from systemd, to kubernetes, to any number of other ways.

We don't need Erlang clusters for redundancy any more. You just run multiple copies of a service against the message bus, on multiple systems for redundancy.

We don't need Erlang's behaviors. We have interfaces, traits, object orientation, and even just pushing that entire problem up to the OS process level, or writing a cloud function, and any number of ways of achieving the same goal.

Erlang's software deploy is interesting, but we have a lot of options for it. The whole attempt to do live updates is interesting, but it also imposed a lot of constraints that systems that don't have that need, which is the vast majority of them, don't need or want. This is perhaps the space where the state of the art isn't that far ahead of Erlang. It's still a mess, despite all the churn in this space. But even so, with all the options available, you can probably find something better for your system than the Erlang way of upgrading software, even if it isn't necessarily much easier.

The cognitive hazard that Erlang presents the community in 2023 is that it has some very good writing on the topic of reliability and its other goals, and then, naturally, one segues into the discussion of how Erlang solved the problem. And it was a very interesting solution for the time. I used Erlang for many, many years back when it was effectively the only solution to these problems.

But it isn't the only solution anymore. The space has exploded with options. Unsurprisingly, the ones that a highly innovative pioneer tried out first are not the best, or the only. They chose well. Let me again emphasize my respect for the project. But it's not on the cutting edge anymore.

Granted, the diversity of options does mean the real world has gotten quite chaotic, where you may have three message busses attaching systems implemented in a dozen different languages, but that's something you can take up with Conway's Law. Erlang couldn't work with Conway's Law without totally converting your entire company to it, which just isn't going to happen.

The reason why language designers aren't rushing to copy Erlang is that what was excellent and amazing in 2000 (and, again let me underline, I mean that very seriously, it was a cutting edge platform built with a lot of vision and moxie) is, in 2023, mediocre. Erlang is a mediocre language (Elixir is at least "good", Erlang is mediocre), attached to a mediocre message bus, with a type system that doesn't even reach mediocre, with a mediocre totalizing approach to system design where there's a very significant impedence mismatch between it and the rest of the world, with an at-par-at-best VM (I won't call that mediocre, but where it used to be head-and-shoulders above everything else in certain ways, it is now merely competitive), with mediocre standard libraries, and a mediocre product fit to its own stated goals. It just isn't the best any more.

The state of the art right now is super chaotic. I can hardly get two systems deployed on the same infrastructure any more, because there's always some reason something in that list has changed. But when the chaos settles and best practices emerge, something that I'd say is at least a good 5 years away, the result will clearly have Erlang inspiration in it for sure... but it won't look a lot like Erlang on the surface.

What is worth copying has largely been copied. It doesn't look exactly like Erlang, but this turns out to be a good thing.


> And then, within the matrix of those message busses, we don't need Erlang's restart capability. We have an abundance of ways to restart processes, from systemd, to kubernetes, to any number of other ways.

I agree with pretty much all of your comment (which clearly comes from a place of deep experience), but the thing that keeps bringing me back to the ideas of Erlang--potentially trying in vain to implement similar concepts in the languages I actually work in (including developing a way to manage fibers in C++ coroutines that work similar to Erlang processes so I could debug background behaviors)--is the idea that these restartable and isolated units simply aren't large enough to be managed by the operating system and systemd or kubernetes of all things: they are things like individual user connections. While there are plentiful easy ways to do shared-nothing concurrency in the world attached to virtually every software project and framework these days, they are all orders of magnitude more expensive than what Erlang was doing, even with its silly little kind of inefficient VM.


It's worth noting that jerf implemented process trees including restarting services in golang - so I'm curious in what way he means that we don't need Erlang's restart capability when he added it to his own library.


We do not need Erlang's exact solution.

The suture interface for a service in the latest version is:

    type Service interface {
     Serve(ctx context.Context) error
    }
Where's all the stuff for gen_server? https://www.erlang.org/doc/man/gen_server.html Where's the "start_link" versus "start_monitor" distinction? Where's "cast" versus "call"? Heck, where's "stop"?

The answer is, those things are all Erlangisms. They make sense in Erlang. But in Go, the supervisor tree doesn't need to enforce any of those things. Cast or call or multicall or whatever else you like in the code that talks to those services. When I got down to it I couldn't even justify a "Start" or "Initialize" call; I just couldn't help but notice it was completely redundant and it could just be in the Serve function.

What you need is that a single crash does not take down your entire service. It does not have to be in Erlang supervisor trees with Erlang gen_servers that have the exact features and behaviors as Erlang supervisors and the exact APIs. It doesn't even have to be in an OS process the same way as Erlang; cloud lambda functions in many ways solve the same problems in a completely different way. Getting too stuck in Erlang's thoughtspace inhibits your ability to design solutions. There are many ways to make it so a single crash does not take down your entire service. Erlang's way is not the only way.


> Heck, where's "stop"?

I'm intermediate (at best) in Go, but I've always found server teardown in Go a little clunky: most solution's I've seen are variations of sending a KILL signal to a channel that the server is listening on, which technically is still message-passing, though unstandardized, unlike stop/n


That sounds very interesting. What's the project that interface if from?



I read that list as "Erlang doesn't have exclusive rights to X"


In a way, this "totalizing" can be looked at as a pro rather than a con.

Yes, today we have an entire collection of ecosystems of services that can provide the key functionalities as described above. Each of these technologies comes with it's own long tail of dependencies, security issues, maintenance effort, plain old computational overhead, etc.

Meanwhile, this 30-year old technology provides matching functionality (yes, admittedly with syntax and object types that simultaneously induce vertigo and motion sickness), but all the bugs have long been eradicated or encased in amber, and pound-for-pound it will rip circles around an alternative solution that's dragging a Java VM or a megaton of node_modules along with it, wrapped in docker images and k8s yamls.

I use yaws as my go-to webserver. It's nuke-proof. It's simple*, and it Just Works. Good luck finding a haxxor that can breach it. I believe that it's in large part due to the very simple conceptual building blocks it's constructed out of (the gen_ behaviours OP describes).


> In a way, this "totalizing" can be looked at as a pro rather than a con.

Yep, if you want to write a new concurrent system, you will move much faster living inside this integrated environment.

And of course vertical scaling is easier than horizontal. It'll be a long time before you outgrow a huge server.

And if it's not, that's a nice problem to have. So you split off parts of the app into other servcies and erlang/elixir is wonderful at communicating with/controlling other network addressable services.

The problem with erlang is that it's both harder to get started and has a lower ceiling than some other languages. But there's a huge middle class of software that would really benefit from it if they got over the initial hump.


> We don't need Erlang clusters for redundancy any more. You just run multiple copies of a service against the message bus, on multiple systems for redundancy.

The thing about Erlang is that you never needed clusters at all. The redundancy was built into each instances with the runtime. When you build that way, everything naturally scales out horizontally with additional processors and/or physical nodes.

You can't do that in any other language without building the entire system for it from the ground up.

Using multiple systems for redundancy means counting on the entire system going down. The Erlang way isolates this impact to 1 of potentially millions of parallel actions on the system itself. Using other systems for redundancy, the other million actions in progress on this server go down with it. The difference in the level of redundancy is significant.

But I do agree with you that we don't need it for most systems because most systems simply aren't that complex. The benefit of the BEAM comes from simplifying complexity, which tends to evolve over time. Elixir, Phoenix and LiveView will likely lead to earlier adoption of the BEAM in projects before the complexity ramps up which will show a long term benefit.


"The thing about Erlang is that you never needed clusters at all."

IIRC, if you read the original thesis, the reason for clusters is just that there's always that chance an entire machine will go down, so if you want high reliability, you have no choice but to have a second one.

The OP is correct in that the key to understanding every design decision in Erlang is to look at it through the lens of reliability. It also helps to think about it in terms of phone switches, where the time horizon for reliability is in milliseconds. I am responsible for many "reliable" systems that have a high need for reliability, but not quite on that granularity. A few seconds pause, or the need for a client to potentially re-issue a request, is not as critical as missing milliseconds in a phone call.


That's true. You do always need to plan for machine redundancy but hopefully machines don't completely fail that often. I can't remember the last time I experienced an instance failure that wasn't a data center wide impact.

It impacts how you architect certain solutions though. For example, if you've got users connected with websockets you're suddenly able to maintain their state right there with the connection.

In a situation where you can't rely on state on the server itself, every websocket connection has to relay to some backend system like Redis/DB, etc since the state can't be counted on at the connection layer.


Your comment seems to consider Erlang as a tool to build heterogeneous systems, where you have multiple distinct applications, written in the same or different technologies, talking to each other.

In such cases, I would agree with part of your criticisms because it is indeed the wrong tool for the job. Erlang was not designed to solve this problem: the serialization format is centered around Erlang. The distribution messages reflect the semantics of processes, messaging, monitoring, etc. Inter communication is not the focus. Even on its early days, the distribution was used to provide tolerance against hardware failures by running two identical systems. So I find comparing Kafka and Erlang to be an apples to oranges scenario.

In my opinion, Erlang shines for building homogeneous systems: multiple instances of the same application running in a cluster. Precisely because all I need is Erlang. It comes with a unified programming model for both local and distributed execution. Look at Phoenix uses it to provide features such as distributed pubsub and presence out-of-the-box, features which either require external tools - and additional complexity - or simply do not exist in other platforms. And the beauty in designing such solutions is that you start with the concurrent version and then naturally evolve into making it distributed (if necessary).

I also find the comparison equally misses the mark between restarts/fault-tolerance and Kubernetes. Because, once again, they mostly work at different levels. The classical example is using supervisors to model database connections, something you simply cannot delegate to k8s. But a more recent example comes from working on Nx, which communicates to the GPU. You can stream data in and out of the GPU, but what happens when the code streaming data errors out? You need to develop a synchronization mechanism to make sure the GPU does not get stuck. And what happens if the synchronization mechanism fails? With Erlang I can model and test all of those failure scenarios quite elegantly. Perhaps there are better approaches lurking out there, but it certainly isn't k8s.

When it comes to k8s, they mostly complement each other. Erlang tool for restarting _machines_ is basically non-existent (there is -heart or the stand-by system described by Joe) and k8s addresses that. Erlang doesn't have service discovery, k8s covers that gap. But, even then, there is no assumption you must use Erlang clustering. It is opt-in, you don't have to use it, and in the "worst case" scenario, you can deploy Erlang just as any other language.


The totalizing environment is a huge boon, rather than a downside. The utility of tooling is directly tied to how predictable the entire architecture of your application is. If, at the highest level, your projects always consist of a random hodgepodge of services written in a variety of flavor-of-the-week programming languages, 100% of your top-level tooling will need to be custom built for your project. In contrast, if all of your projects consist of "Erlang" or maybe "Erlang+a database", then suddenly you can create tooling that works on any project. For example, Observer lets you look at the entire process tree of your application, for any Erlang application. If your entire project is an Erlang application, you now have complete visibility into your entire application's current state. It's practically magic.

That was my biggest point, but you posted a long comment with a few other disagreeable points, too:

1. re:Conway's Law : Just write the code in Erlang (read: Elixir in this day and age) to begin with. Rewriting large applications in new languages is rarely worth it anyways, that's not a failure of the better language you want to switch to, it's a failure of the poor choice of language you started with.

2. re: But it's not on the cutting edge anymore: And yet it is. Erlang and Elixir are the only languages in anything remotely resembling widespread use to not have an utterly pathological concurrency story. Async-Await is an awkward crutch to shoehorn concurrency into languages that were never designed to support it, Go and its goroutines entirely misses the point of the exercise and simultaneously encourages you to write mutable state and punishes you for doing so, and the Actor model libraries for other languages are just half-baked, bug-ridden implementations of a small fraction of Erlang.


> Erlang couldn't work with Conway's Law without totally converting your entire company to it, which just isn't going to happen.

I think this says more than you think it does.

I've worked in software development professionally for 20 years. I have never worked at a company that had more than 1 backend language. A large percentage of developers work at small companies.

Erlang/Elixir/BEAM isn't for the Googles of the world. And that's fine. Tech needs to stop its infatuation with these companies. Just because something is right for those companies, doesn't mean it's right for yours. Ignoring this has done a lot of damage over the years. Production complexity is a killer.

People love to talk about scaling (and here, I'm talking about scaling a company), but only ever mention one aspect: scaling up. Things also have to scale down. Technological choices tend to sacrifice one for the other.

But BEAM languages have a wider scaling band than most other technological choices. And that is in large part because of everything it offers out of the box.


No one ever talks about scaling down systems, everything is always additive.


> In the light of this statement, the answer to what I think is the thesis question of that entire piece:

> "This begs the question: why aren't language and library designers stealing the structure behind Erlang's behaviours, rather than copying the ideas of lightweight processes and message passing?"

> [...]

> But for the world we live in today, that's a price not worth paying. We don't need the Erlang message bus to be the only message bus. The Erlang message bus is, frankly, not very good, and it's actively terrible if you want to use it for one non-Erlang process to communicate to another. We don't need the Erlang message bus. We have a dozen message busses, some in the cloud, some commercial, some open source, some that double as retention (Kafka), all of which scale better, none of which tie you to Erlang's funky data types.

I asked why they were not stealing the structure of behind Erlang's behaviours, I didn't suggest anyone should steal Erlang's message bus or anything else.

> And then, within the matrix of those message busses, we don't need Erlang's restart capability. We have an abundance of ways to restart processes, from systemd, to kubernetes, to any number of other ways.

I don't think restarting the process from systemd or kubernetes is comparable with a supervisor tree. First of all the tree gives you a way to structure and control the restarts, e.g. frequently failing processes should be further down the tree or they will cause their sister nodes to get restarted etc. The other obvious difference is speed.

> We don't need Erlang's behaviors. We have interfaces, traits, [...]

Yet I don't know of any other language which uses interfaces in a way which they achieve the benefits (listed in the article) that behaviours in Erlang (e.g. gen_server) give you, do you?


"I asked why they were not stealing the structure of behind Erlang's behaviours, I didn't suggest anyone should steal Erlang's message bus or anything else."

To some extent I know... but to some extent the answer is these things are all tied together. Erlang is a really tight ball of solutions to its own problems at times. I don't mean that in a bad way, but it all works together. It needs "behaviors" because it didn't have any of the other things I mentioned.

When I went to implement behaviors (https://www.jerf.org/iri/post/2930/ ), I discovered they just weren't worth copying into Go. You ask if any other language uses interfaces to achieve what Erlang does; my perspective is that I've seen people try to port "behaviors" into two or three other languages now, and they're always these foreign things that are very klunky, and solve problems better solved other ways.

"I don't think restarting the process from systemd or kubernetes is comparable with a supervisor tree."

It isn't, but the problem is...

"First of all the tree gives you a way to structure and control the restarts, e.g. frequently failing processes should be further down the tree or they will cause their sister nodes to get restarted etc."

I don't need that. I've been using supervisor trees for over a decade now, and they rarely, if ever, go down more than "application -> services". Maybe somebody out there has "trees" that go down six levels and have super complicated bespoke restart operations on each level and branch, but they must be the exception.

To the extent that I have deep trees, they're for composition, not because I need the complicated behaviors. A thing that used to be a single process service is now three processes, and to hide that, I make that thing a supervisor of its own so that the upper levels still just see an ".Add()" operation, instead of having to know about all the bits and pieces.

"The other obvious difference is speed."

Certainly, but those are only one of the options.

"Yet I don't know of any other language which uses interfaces in a way which they achieve the benefits (listed in the article) that behaviours in Erlang (e.g. gen_server) give you, do you?"

This is a case of what I'm talking about. Don't confuse Erlang's particular solution for being the only possible solution. Erlang's behaviors are basically the Template pattern (https://en.wikipedia.org/wiki/Template_method_pattern ) written into the language rather than implemented through objects. If you look for the exact Erlang behaviors out in the wild, you won't hardly find anything. If you look for things that solve the same problems, there's tons of them. A lambda function in AWS is a solution to that problem. The suture library I wrote is a different one. Java frameworks have their own solutions in all sorts of different ways.

To put it another way, whereas in 1998 people having the problems Erlang solved was rare, today we all have them. We can't be blundering around with no solutions since we are all too blinkered to use Erlang which just solves them all. That makes no sense. There are far more distributed systems concerned with reliability out there now implemented in not-Erlang than in Erlang. We are not all just blundering along in a fog of confusion, unaware of the existence of architecture, modularity, and abstraction. If programmers have a flaw, it's too much architecture rather than too little.

Maybe that's one of the problems with the Erlang writing. It's all implicitly written from a perspective of the 90s, where this is all a surprise to people, and it kind of seeps in if you let it. But that's not where the world is right now. It is not news that we need to be reliable. It is not news that we want to run on multiple systems. I've got non-technical managers asking me about this stuff at work whenever I propose a design. There's been a ton of work on all of these issues. It's not all good, by any means! But there's now too many solutions moreso than not enough.


> To some extent I know... but to some extent the answer is these things are all tied together. Erlang is a really tight ball of solutions to its own problems at times. I don't mean that in a bad way, but it all works together. It needs "behaviors" because it didn't have any of the other things I mentioned.

I can see why you would say that regarding the supervisor behaviour, but I don't see how your argument applies to the other five behaviours I wrote about. Let's keep it simple and focus on, say, `gen_server`?

> This is a case of what I'm talking about. Don't confuse Erlang's particular solution for being the only possible solution.

I'm not, in fact I mention that I've started working on a small prototype to explore implementing behaviours in a different way at the end of the post.

> Erlang's behaviors are basically the Template pattern (https://en.wikipedia.org/wiki/Template_method_pattern ) written into the language rather than implemented through objects. If you look for the exact Erlang behaviors out in the wild, you won't hardly find anything. If you look for things that solve the same problems, there's tons of them.

My understanding of Joe's thesis is that we can compose a small set of behaviours into complex systems. He and OTP expose and highlight this structure. I don't doubt that other's have also discovered the usefulness of this structure, but I don't see anyone else try to help bring that structure to the forefront so that we can improve the state of software.

Your argument seems to be that this structure has now become implicit in the tools that have been developed since then, but I think again this misses the point: the structure is simple and should be made explicit not hidden away behind 500k lines of C (systemd) or almost 4M lines of Go (kubernetes).


Additionally, the fact that a very small set of standard behaviours has been re-used time and time again for big software projects means they’re extremely well-tested, and well-documented.


> We don't need Erlang's behaviors. We have interfaces, traits, object orientation, and even just pushing that entire problem up to the OS process level, or writing a cloud function, and any number of ways of achieving the same goal.

None of these achieve the same goal. Or they result in significantly more complicated and brittle systems. Or they only achieve that goal insofar as you need to glue several heterogenous systems together.

> The reason why language designers aren't rushing to copy Erlang is that what was excellent and amazing in 2000 (and, again let me underline, I mean that very seriously, it was a cutting edge platform built with a lot of vision and moxie) is, in 2023, mediocre.

The main reason is that it is borderline impossible to retrofit Erlang model onto an existing language. Adding concurrency alone may be a decade-long project (see OCaml). Adding all of the guarantees that Erlang VM provides... well.

And on top of that too many people completely ignore anything in Erlang beyond "lightweight processes/actors".

The fact that you can have an isolated process that you can monitor and observe, and have a guaranteed notification that it failed/succeeded without affecting the rest of the system is a) completely ignored and b) nearly impossible to retrofit onto existing systems.

And there are exceedingly few new languages that even think about concurrency at all. And async/await is not even remotely state of the art (but people are busy grafting them onto all languages they can lay their hands on).

State of the art still is mutexes and killing your entire program if something fails. Often both of those.


>But when the chaos settles and best practices emerge, something that I'd say is at least a good 5 years away

That's incredibly optimistic. I am 30 and have basically no hope this will happen in my lifetime, and in the meanwhile, BEAM works great. I agree with all your points, except when you write 2023 as if we're doing things better now. Research in this area hasn't really bore many fruit over the last 30 years as you make it look like.


I'm not quite sure how Erlang's world is totalizing. It has ways to ship things in a very integrated manner, but I have shipped and operated Erlang software that was containerized the same as everything else, in the same K8s cluster as the rest, with the same controls as everything else, with similar feature flags and telemetry as the rest, using the same Kafka streams with the same gRPC (or Thrift or Avro) messages as the rest, invisibly different from other applications in the cluster to the operator in how they were run aside from generating stack traces that look different when part of it crashes.

That it _also_ ships with other ways of doing things in no way constrains or limits your decisions, and most modern Erlang (or Elixir) applications I have maintained ran the same way.

You still get message passing (to internal processes), supervision (with shared-nothing and/or immutability mechanisms that are essential to useful supervision and fault isolation), the ability to restart within the host, but also from systemd or whatever else.

None of these mechanisms are mutually exclusive so long as you build your application from the modern world rather than grabbing a book from 10-15 years ago explaining how to do things 10-15 years ago.

And you don't _need_ any of what Erlang provides, the same way you don't _need_ containers (or k8s), the same way you don't _need_ OpenTelemetry, the same way you don't _need_ an absolutely powerful type system (as Go will demonstrate). But they are nice, and they are useful, and they can be a bad fit to some problems as well.

Live deploys are one example of this. Most people never actually used the feature. Those who need it found ways (and I wrote one that fits in somewhat nicely with modern kubernetes deployments in https://ferd.ca/my-favorite-erlang-container.html) but in no way has anyone been forced to do it. In fact, the most common pattern is people wanting to eventually use that mechanism and finding out they had not structured their app properly to do it and needing to give it a facelift. Because it was never necessary nor totalizing.

Erlang isn't the only solution anymore, that's true, and it's one of the things that makes its adoption less of an obvious thing in many corners of the industry. But none of the new solutions in the 2023 reality are also mutually exclusive to Erlang. They're all available to Erlang as well, and to Elixir.

And while the type system is underpowered (and there are ongoing area of research there -- I think at least 3-4 competing type systems are being developed and experimented with right now), that the syntax remains what it is, I still strongly believe that what people copied from Erlang were the easy bits that provide the less benefit.

There is still nothing to this day, whether in Rust or Go or Java or Python or whatever, that lets you decompose and structure a system for its components to have the type of isolation they have, a clarity of dependency in blast radius and faults, nor the ability to introspect things at runtime interactively in production that Erlang (and by extension, languages like Elixir or Gleam) provide.

I've used them, I worked in them, and it doesn't compare on that front. Regardless of if Erlang is worth deploying your software in production for, the approach it has becomes as illuminating as the stacks that try and push concepts such as lack of side-effects and purity and what they let you transform in how you think about problems and their solutions.

That part hasn't been copied, and it's still relevant to this day in structuring robust systems.


> We don't need the Erlang message bus to be the only message bus. The Erlang message bus is, frankly, not very good

Genuinely curious why it's not very good. Were you speaking solely from the perspective of non-Erlang processes? And also specifically regarding remote messages rather than local?


There's two ways a non-erlang process can speak Erlang terms.

It can implement the Erlang node protocol and show up as a full Erlang node. Neat capability, but the impedence mismatch between systems is something fierce, because the protocol deeply assumes that you're basically Erlang, e.g., not just that processes have mailboxes but that mailboxes have the exact same semantics as Erlang, and you have to implement process linking with the exact same semantics, etc. It's difficult.

Alternatively, you can write a proxy in Erlang where the first process speaks to an Erlang server to send a term to some other Erlang process that will then relay the message in whatever form. This will either be a custom protocol, in which case this is an awful lot of code to write for such a task, or a common messaging protocol in which case you don't need Erlang. That is, you can speak rabbitmq, but that doesn't need to be implemented in Erlang.

The production system I maintained on Erlang did this a lot, because I was forced to have a lot of Perl code interacting with the Erlang system. Back in the day it was a fine choice; it was a time when you couldn't just pop on to google and turn up a dozen battle-ready message busses in two minutes. Now, though, it is far better for the Erlang code to just be another node on your common bus than for it to be the message bus.


I agree with the headline because uptime was the goal of Erlang.

Lightweight processes using message passing is how Erlang stays up (and perhaps ironically, let-it-crash is how Erlang avoids going down).

Lightweight processes and message passing are the architectural decisions that address the business problem Erlang was designed to solve.


Relevant, I’ve written a bit about the various actor models here including Erlang under process-based actors: http://dist-prog-book.com/chapter/3/message-passing.html


opening the root page http://dist-prog-book.com/ seems to load Tufte CSS documentation


It makes me sentimental to see some erlang code, truly miss writing erlang, as its probably the closest to how i think about systems. But it is so hard to justify using it when most of my problems can be solved with javascript edge workers. These days it's rare for me to even come across a backend heavy project, yet something that would fit perfect to erlang. Does anyone else fell this way or is it because of my drift in the industry? When was the last project you started and thought this is perfect to solve with erlang?


We need backend heavy erlang prototype


Arguably that’s what Elixir is. They’re really trying to push Phoenix as a batteries included full stack solution. I know it’s not exactly Erlang, but the interfaces for things like gen_server are nearly identical. It’s effectively Erlang with a more modern syntax.


Its also got much better tooling, and a few features that don't exist in Erlang yet (Tasks - and no, tasks are not just proc_lib, they're actually special.


I don't mean to disparage the article (quite the contrary, it's quite the eye opener if you're not accustomed to architecting your code in this fashion) but choosing go of all languages to mockup the ideas was a terrible choice. It's syntax and type system are really quite the red-headed stepchild and for anyone unfamiliar with its syntax, its incredibly hard to make heads or tails of.

(And I say this as someone that's at least passably familiar with go and have worked on/contributed to golang projects in the past.)


As far as I can tell the article contains only erlang snippets directly quoted from the paper reviewed.


I think you may be alone in this assessment. Go is uniquely easy to understand, due to its small set of keywords and lack of esoteric e.g. sigils.


Yeah, I had no idea how to read the Go code.

I'm also coming from a ruby/js/ocaml/elixir/erlang background.


The Go code, in its entirety:

    type HasName interface { Name() string }
    
    func Greet(n HasName) { fmt.Printf("Hello %s!\n", n.Name()) }
    
    type Joe struct {}
    
    func (_ *Joe) Name() string { return "Joe" }
    
    type Mike struct {}
    
    func (_ *Mike) Name() string { return "Mike" }
    
    func main() {
            joe := &Joe{}
            mike := &Mike{}
            Greet(mike)
            Greet(joe)
    }

You have no idea how to read this?


Yeah sorry, I had to drop it in chatgpt3 to explain it to me.

It’s like some sort of weird pseudo object oriented code.

The * and := are completely foreign. The string interpolation is weird.

Not sure what or why the &.

At the end of the day, if you went to college and learned programming, CS with Java or C, maybe this make sense.

I learned CS with Scheme, and OCaml… so your mileage may vary.


I wrote a multithreaded message passing system in Java and I was trying to get async await to work

The thing I want to define with behaviours is observable effects on objects.

Such as interactions between tasks and objects between them. Not necessarily method calls but state interactions across multiple objects.

Some objects should be colocated on threads and send behaviours to arbitrary groups of objects. Think of it as a collection that responds to events.

I wrote a state machine serialization that looks like this. This defined a state progression between threads and async/await state machine.

I think organised state machines would mean actor programming is so much easier to program.

  next_free_thread = 2
    task(A) thread(1) assignment(A, 1) = running_on(A, 1) | paused(A, 1)

    running_on(A, 1)
    thread(1)
    assignment(A, 1)
    thread_free(next_free_thread) = 
      fork(A, B)
     |         
  send_task_to_thread(B,  
  next_free_thread)
      | running_on(B, 2)                         
  paused(B, 1)
                                    
    running_on(A, 1)
    | { yield(B, returnvalue) | paused(B, 2) }
      { await(A, B, returnvalue) | paused(A, 1) }
     | send_returnvalue(B, A, returnvalue)
I think this serialization is truly powerful and easy to understand.


J2EE could have had all that but it was missing an OTP. [Java got the competing 'application server' vendors competing themselves out of a massive market.] Components with well defined life cycles, request/reply and message processing, dynamic remote interface binding, isolated execution environment, pure single threaded application level code ('business logic'), declarative composition ... that's all in the specs.

Erlang is mainly about OTP. OTP delivered an opionated take on distributed components -- think of Erlang on OTP as Ruby on Rails -- and did it exceptionally well.

But one could still do this with Java btw. The specs are still there and they are solid.


Erlang isn't, but the BEAM is.


Yes came to say this. The Erlang VM is about lightweight processes and message passing.


One thing I like about "behaviours" is that I can immediately recognise, discard boiler plate, skim quickly, and see the difference from expected pattern just by knowing the behaviour that the process implements.


Without those lightweight processes that process messages sequentially from their inbox these behaviors won't give the same concurrency guarantees.

I don't know why the author is turning this into a competition of whether processes are more important or behaviors - they both are parts of a well designed system that work well together.

GenServers etc can't be written equivalently in Go/Java since goroutines and Java threads (even the new virtual threads) are not pre-emptive, whereas Erlang processes are truly independent.


Isn't Erlang's model a giant ball of microservices?


Nanoservices.


> In 1998 Ericsson decided to ban all use of Erlang.

What's the story there? Why did they decide to ban it?

EDIT: doh, this has been answered elsewhere in this thread.


Wait till this person learns about links, monitors, call's default automatic backpressure mechanism, etc...


This is fucking cool, thanks for sharing


>"The next interesting behavior is supervisor. Supervisors are processes which sole job is to make sure that other processes are healthy and doing their job. If a supervised process fails then the supervisor can restart it according to some predefined strategy."

This is an interesting software pattern -- especially applied to having it entirely supported entirely inside of one programming language... We see this software pattern in such things as Windows Services (they restart automatically if they fail), Unix/Linux daemons, as one of the original purposes of the original Unix 'init' process, and in high-availability, mission-critical 24/7 systems (databases, etc.).

Having all of that fault-tolerant infrastructure entirely inside of a programming language is indeed, somewhat novel.

But let's work up what a given language must have as prerequisites to support this...

First, we need the ability to run multiple processes inside of a language. Many languages can accomplish this with threads, but of the languages that do this, many need additional cumbersome programming to guarantee thread safety...

So the language needs to support the notion of a threaded process inside of that code -- without having to code extra to support this threaded process.

Next, the language needs some form of communication between supervisor (process A) and supervised (process B) code. Message passing is the solution -- but that requires each process to have its own message queue.

And message passing requires interfaces...

Here we're starting to sound like we're duplicating a mini Operating System inside a language(!) (should the language have pipes, too?) -- and/or that the complexity of most OS's would be deeply mitigated by designing them inside of a language that supported these constructs...

Whatever the case, I think it's a fascinating software pattern.

Yes, Go exists and supports features like this, Rust exists, and people are using it to write Operating Systems. And there are probably all other kinds of languages (both existing and existing in the future) that do/will support some form of these constructs...

But there's another interesting, purely academic, purely theoretical question here...

That is: What is the simplest full-featured OS (processes, IPC, synchronization primitives, memory management, hardware resource management, syscalls, scheduling, etc.) that could be written in the smallest amount of lines of code if the language it was written in had intrinsic knowledge of all of those underlying constructs?

Anyway, a fascinating article about Erlang, and it definitely gave the impression that Erlang was a whole lot more than I thought it was... I will be checking out Erlang for future projects!


>Here we're starting to sound like we're duplicating a mini Operating System inside a language

In case you didn't know, Joe Armstrong actually calls Erlang/OTP an "Application Operating System(AOS)" in his thesis paper.

That is the entire point of that language's design.


I did not know that! Interesting! I will have to check out Joe Armstrong's paper...


The discussion in this thread is actually missing the forest for the trees and probably giving a wrong impression of Erlang for people new to it.

Erlang is always spelled as Erlang/OTP (i.e. language/patterns of components provided in libraries) + Erlang Run Time System(ERTS) (the BEAM VM + support components). The whole "System" is what makes it so powerful and uniform with a single environment. Every single "feature" exists only in the context of the whole and cannot be discussed in isolation (See https://blog.stenmans.org/theBeamBook/#_layers_in_the_execut... for a graphic).

Compared to the babel/hodge-podge of languages/tools/frameworks used to duct-tape current-day distributed apps, Erlang/OTP is a godsend. It is purely an accident of fate that it never became mainstream for distributed Web programming for which it is eminently suited for. Imagine Erlang running in the Browser frontend and also on the Server backend. Everything would be uniform and clean and one would have a robust distributed system with all parameters taken care of by the "System" itself; we would just need to focus on the "business logic" and be done with "app development".


>Imagine Erlang running in the Browser frontend and also on the Server backend. Everything would be uniform and clean and one would have a robust distributed system with all parameters taken care of by the "System" itself; [...]

That is a very compelling idea -- the idea of a language which supports concurrency natively -- such that a given source code base could be split up into "client" (web browser) and "server" (web server) components (and/or source code) easily, ideally automatically...

I think that is a very compelling idea, indeed!

This language might indeed be Erlang/OTP + Erlang(ERTS) + BEAM VM -- but it could conceivably be another language -- if that language was designed with the appropriate concurrency in mind: (https://en.wikipedia.org/wiki/List_of_concurrent_and_paralle...)

But that being said, Erlang/OTP+ERTS+BEAM VM -- definitely looks like it is worth further exploration!


This fucking rocks, great work and thanks for sharing.


Erlang vs OTP


These days, if you have Kubernetes, queues, and worker queues, does OTP still have an edge?


Yes.

Being able to do this at the programming language level is extremely powerful and creates an entirely different way of building applications. My go-to analogy is building a city (lots of individual, isolated, separate stack processes) instead of a big skyscraper (deep stack requests, concurrency difficult, error recovery manual).

You can build that city in a limited way with k8s, but there's way more overhead along the way, to the point of it not being enjoyable for me.


Yes because concurrency and fault tolerance are literally at the core of the the BEAM. you get all the above with zero configuration, all through convention and you only need to learn one set of tools.

When some new container technology comes along, or new worker queue platform etc. Erlang/the beam will be exactly the same.

One of the advantages of working with Erlang I found is that it's just as easy to scale down as it is to scale up.

How does one scale down from Kafka once you have vendor lock in with them?


I believe the only argument left is that erlang /OTP provide all of those in a single cohesive environment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: