The amazing thing about Erlang and the BEAM is it's depth of features. To the OP the Behaviour/Interface of Erlang is their biggest take away. For me I believe it is how you require far far less development resources to build complex systems than you would require in any other language (provided comparable experience in both stacks). And for many the lightweight processes and programming model.
OTP itself has so much in it. We've been working on compiling Elixir to run on iOS devices. Not only can we do that through the release process but through using the ei library provided in Erlang we can compile a Node in C that will interface with any other Erlang node over a typical distributed network as you would for Erlang, Elixir, Gleam, etc... furthermore there is a rpc library in Erlang where from C we can make function calls and interface with our Elixir application. Yes, the encoding/decoding has an overhead and FFI would be faster but we're still way within our latency budget and we got this stood up in a few days without even have heard of it before.
The larger point here is that Erlang has been solving many of the problems that modern tech stacks are struggling with and it has solved for scale and implementation cost and it solved these problems decades ago. I know HN has a bit of a click-bait love relationship with Erlang/Elixir but it hasn't translated over to adoption and there are companies that are just burning money trying to do what you get out of the box for free with the Erlang stack.
I went from a company that used Elixir in the backend to one that uses Nodejs.
I had gone in neutral about Nodejs, having never really used it much.
These projects I worked on were backend data pipeline that did not even process that much data. And yet somehow, it was incredibly difficult to isolate exactly the main bug. Along the way, I found out all sorts of things about Nodejs and when I compare it with Elixir/Erlang/OTP, I came to the conclusion that Node.js is unreliable by design.
Don't get me wrong. I've done a lot of Ruby work before, and I've messed with Python. Many current-generation language platforms are struggling with building reliable distributed systems, things that the BEAM VM and OTP platform had already figured out.
Elixir never performs all to well in microbenchmarks. Yet in every application I've seen Elixir/Erlang projects compared to more standard Node, Python, or even C# projects and the Elixir one generally has way better performance and feels much faster even under load.
Personally I think much of it is due to async being predominant in Node and python. Async seems much harder than actor or even threading for debugging performance issues. Sure it feels easier to do async at first. But async leads to small bloat adding up and makes it very difficult to debug and track down. It makes profiling harder, etc.
In BEAM, every actor has its own queue. It's trivial to inspect and analyze performance blockages. Async by contrast puts everything into one giant processing queue. Plus every function call in async gets extra overhead added. It all adds up.
This has to do with how async works without preemption and resource limits.
There's a counter-intuitive thing when trying to balance load across resources: applying resource limits helps the system run better overall.
One example: when scaling a web app, there comes a point when scaling up the database doesn't seem to help. So we're tempted to increase the connection pool because that looks like a bottleneck. Increasing the pool can make the overall system perform worse, because often times, it is slow queries and poorly performing queries that is stopping up the system.
Another example: one of the systems I worked on has over 250 node runtimes running on a single, large server. It used pm2 and did not apply cgroups to limit CPU resources. The whole system was a hog, and I temporarily fixed it by consolidating things to run on about 50 node runtimes.
When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.
There's probably some math that folks who know Operations Research can prove all this.
> When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.
As someone who has advocated against Kubernetes CPU limits everywhere I've worked, I'm really struggling to see how they helped you here. The code used 10x less CPU with CPU limits, with no adverse effects? What were all those CPU cycles going before?
> The code used 10x less CPU with CPU limits, with no adverse effects?
The normal situation is that defective situations get a much large latency, while the correct requests run much faster.
It's a problem on the cases when the first set isn't actually defective. But it normally takes a reevaluation of the entire thing to solve those, and the non-limited situation isn't any good either.
> Async by contrast puts everything into one giant processing queue
How can you make performance claims while getting the details completely wrong?
Neither .NET's nor Rust's Tokio async implementations work this way. They use all available cores (unless overridden) and implement work-stealing threadpool. .NET in addition uses hill-climbing and cooperative blocking detection mechanism to quickly adapt to workloads and ensure optimal throughput. All that while spending 0.1x CPU on computation when compared to BEAM, and having much lower memory footprint. You cannot compare Erlang/Elixir with top of the line compiled languages.
That sounds about right for .NET. One of the Elixir projects I worked on lived alongside a C# .NET, the latter being a game server backend. The guy who architect and implemented it made it so that large numbers of people can interact in realtime without having to shard. It is pretty amazing stuff in my book.
On the other hand, I have yet to have to implement a liveness probe with an Elixir app, and I've had to do that with .NET because it can and does freeze. That game server also didn't use up all the available cores as well as the Elixir app. We also couldn't attach a REPL directly to the .NET app, though we certainly tried.
I would be curious to see if Rust works out better in production.
> I swear, the affliction of failing to understand the underlying concepts upon which a technology A or B is built is a plague upon our industry. Instead, everything clearly must fit into the concepts limited to whatever “mother tongue” language a particular developer has mastered.
Ironic, since any time you post about a programming language it's to inform that C# does it better.
Not just here; someone with your nick also whined when the creator of C# made a technical deficient decision when choosing Go over C# to implement typescript.
It's hard for a rational person to believe that someone would make the argument that the creator of the language must have made a mistake just because he reached for (in his words) a more appropriate language in that context.
You have a blind spot when it comes to C#. You also probably already know it.
> Not just here; someone with your nick also whined when the creator of C# made a technical deficient decision when choosing Go over C# to implement typescript.
You know you could have just linked the reply instead? It states "C#, F# or Rust". But that wouldn't sound that nice, would it? I use and enjoy multiple programming languages and it helps me in day-to-day tasks greatly. It does not prevent me from seeing how .NET has flaws, but holistically it is way less bad than most other options on the market, including Erlang, Go, C or what have you.
> It's hard for a rational person to believe that someone would make the argument that the creator of the language must have made a mistake just because he reached for (in his words) a more appropriate language in that context.
So appeal to authority trumps observable consequences, technical limitations and arguments made about lackluster technical vision at microsoft? Interesting. No, I think it is the kind of people who refuse to engage with the subject on their own merits that are a problem, relegating to the powers that be all the argumentation. Even in a team environment, sure it is easier to say "a team/person X makes a choice Y" but you could also, if the situation warrants it, expand on why you think this way, and if you can't maybe you shouldn't be making a statement?
So no, "TypeScript, including Anders Hejlsberg, choosing Go as the language to port TS compiler to" does not suddenly make pigs fly, if anything, but being seen as an endorsement from key C# figure is certainly a bad look.
> So appeal to authority trumps observable consequences, technical limitations and arguments made about lackluster technical vision at microsoft?
Your argument is that you have a better grasp of "technical limitations" than Anders Hejlsberg?
You'll forgive the rest of us for not buying that; he has proven his chops, you haven't, especially as the argument (quite a thorough explanation of the context) from the typescript team is a lot more convincing than anything we've seen from you (a few nebulous phrases about technical superiority).
> but being seen as an endorsement from key C# figure is certainly a bad look.
Yeah, well, the team made their decision with no regard to optics. That lends more weight to their decision, not less.
The issue is not that Anders is incapable. His best argument was that they wanted to have the new code look like the old code. Many of the other arguments Anders brought forward were confusing, since some of them were technically incorrect. This raises some questions.
Typescript is a huge success from Microsoft in terms of recapturing developers, without them knowing. MS is not a charity, look at how little love they give to F# compared to TS.
* My personal guess is that the age old MS instinct came into play: be coûte que coûte backwards compatible, port all the bugs, do not disturb anything.
* A second reason might be that TS people might not want to learn .net because of vibes. Do not underestimate vibes. Almost everyday on HN I see Python programs being posted where most often the creator would be better of if they had learned some next programming language. Decisions are seldomly made on a technical basis. We as humans decide emotionally, sometimes with rationalizations afterwards.
And so, maybe Anders was rational in acknowledging the dev-social situation as is.
Whatever the reason, this will not be without consequences. The team now has to invest in GO and now depends on Google to take TS forward. And yes, this is also typical MS, one department can easily undo the other.
TLDR: the technical arguments were mostly nonsense, but the real arguments have likely more to do with age-old reflexes and dev-cultural issues.
> Neither .NET's nor Rust's Tokio async implementations work this way.
Well that’s great. I didn’t mention Rust in that list because it does seem to perform well. Its async is also known as to be much more difficult to program.
> and having much lower memory footprint. You cannot compare Erlang/Elixir with top of the line compiled languages.
And yet I do and have. Despite all the cool tech for C# and .Net, I’ve seen simple C# web apps struggle to even run on Raspberry pi’s for IoT projects while Elixir ones run very well.
Also note Elixir is a compiled language and BEAM has JIT nowadays too.
I did hesitate to add C# to that list because it is an impressive language and can perform well. I also know the least about its async.
Nothing you said really counters that async as a general paradigm is more likely to lead to worse performance. It’s still more difficult to profile and tune than other techniques even with M:N schedulers. Look at the sibling post talking about resource allocation.
Even for Rust there was a HM post recently where they got a Rust service to run a fair bit faster than their initial Golang implementation. After months of extra work that is. They mentioned that Golang’s programming model made it much easier to write fairly performant networking code for. Since Go doesn’t use async it seems reasonable to assume go routines are easier to profile and track than async even if I lack knowledge of Go’s implementation details on the matter. Now I am assuming their Rust implementation used async but don’t know for sure.
> Also note Elixir is a compiled language and BEAM has JIT nowadays too.
Let's see it perform faster than Python first :)
Also, if the target is supported, .NET is going to unconditionally perform faster than Elixir. This is trivially provable.
> Nothing you said really counters that async as a general paradigm is more likely to lead to worse performance. It’s still more difficult to profile and tune than other techniques even with M:N schedulers. Look at the sibling post talking about resource allocation.
Can you provide any reference to support this claim as far as actually good implementations go? Because so far it looks like vibe-based reasoning with zero knowledge to substantiate the opinion presented as fact.
That's not surprising however - Erlang and Elixir as languages tend to leave their heavy users with big knowledge and understanding gaps and their communities are rather dogmatic about BEAM being the best next thing since sliced bread. Lack of critical thinking leads to such a sorry place.
> Can you provide any reference to support this claim as far as actually good implementations go?
Ah yes now to the No True Scotsman fallacy. Async only works well when it’s “properly implemented” which is only .NET.
Even some .NET folks prefer actors model for concurrent programming:
> Orleans is the most underrated technology out there. Not only does it power many Azure products and services, it is also the design basis for Microsoft Service Fabric actors, which also power many Azure products. Virtual actors are the perfect solution for today’s distributed systems.
> In my experience Orleans was able to handle insane write load (our storage/persistence provider went to a queue instead of direct, it was eventually consistent) so we were able to process millions of requests without breaking a sweat. Perhaps others would want more durability, we opted for this as the data was also in a time series database before Orleans saw it.
Ironically what got me into Elixir was learning about Orleans and how successful it was in scaling XBox services.
> Because so far it looks like vibe-based reasoning with zero knowledge to substantiate the opinion presented as fact.
Aside from personal experience and years of writing and deploying performance sensitive IoT apps?
Well quick googling shows quite a few posts detailing async issues:
> What tools and techniques might be suited for this kind of analysis? I took a quick glance at a flamegraph but it seems like I would need a relatively deep understanding of the async runtime internals since most of what I see looks like implementation details.
> Reading a 1GB file in 100-byte chunks leads to at least 10,000,000 IOs through three async call layers. The problem becomes catastrophic since these functions are essentially language-level abstractions of callbacks, lacking optimizations that come with their async nature. However, we can manually implement optimizations to alleviate this issue.
> I’m not going to say all async frameworks are definitely slower than threads. What I can say confidently is that asyncio isn’t faster, and it’s more efficient only for huge numbers of mostly idle connections. And only for that.
Do you realize that actor model and virtual/green threads/stackful coroutines vs stackless coroutines / async/await and similar are orthogonal concepts?
Also picking asyncio from Python. Lol. You can't be serious, can you?
The only impression I get is most Elixir/Erlang practicioners simply have very ossified perception and deep biases that prevent them from evaluating implementation/design choices fairly and reaching balanced conclusions on where their capabilities lie. Very far cry from the link salad you posted that does not answer my question e.g. the issues with .NET and Rust async implementations performance-wise.
It's impossible to have a conversation with someone deeply committed to their bias and unwilling to accept that BEAM is not the shining paragon of concurrent and multi-threaded runtimes it once was.
Starting with the most general: Nodejs suffers in the same way that other async systems do -- the lack of preemption means that certain async threads can starve other async threads. You can see this on GUI desktop apps when the GUI freezes because it wasn't written in a way to take that into account.
In other words, the runtime feature that Nodejs is the most proud of and markets to the world as its main advantage does not scale well in a reliable way.
The BEAM runtime has preemption and will degrade in performance much more gracefully. In most situations, because of preemption (and hot code reloading) you still have a chance for attaching a REPL to the live runtime while under load. That allows someone to understand the live environment and maybe even hot patch the live code until a the real fix can run through the continuous delivery system.
I'm not going to go into the bad Javascript syntax bloopers that still haunts us, and only partially mitigated by Typescript. That is documented in "Javascript: The Good Parts". Or how the "async" keyword colors function calls, forcing everything in a call chain to also be async, or forcing you to use the older callbacks. Most people I talk to who love Typescript don't consider those as issues.
The _main_ problems are:
1. Async threads can easily get orphaned in Nodejs. This doesn't happen when using OTP on BEAM because you typically start a gen_server (or a gen_*) under a supervisor. Even processes that are not supervised can be tracked. Because pids (identifiers to processes) are first-class primitives, you can always access the scheduler which will tell you _all_ of the running processes. If you were to attach a Nodejs REPL, you can't really tell. This is because there is no encapsulation of the process, no way to track when something went async, no way to send control messages to those async processes.
2. Because async threads are easily orphaned, errors that get thrown gets easily lost. The response I get from people who love Typescript on Nodejs tells me that is what the linter is for. That is, we're going to use an external tool to enforce all errors gets handled, rather than having the design of the language and the runtime handle the error. In the BEAM runtime, unhandled errors within the process crashes the process, without crashing anything else; processes that are monitoring that process that crashed gets notified by the runtime that it has crashed. The engineer can then define the logic for handling that crash (retry? restart? throw an error?).
3. The gen_server behavior in OTP defines ways to send control messages. This allows more nuanced approaches to managing subsystems than just restarting when things crash.
I'm pretty much at the point where I would not really want to work on deploying Nodejs on the backend. I don't see how something like Deno would fix anything. Typescript is incapable of fixing this, because these are design flaws in the runtime itself.
Just to further hammer point 2 and how it’s a problem in the real world, Express, probably the go to server library for close to a decade, has only within the last couple months sorted out not completely swallowing any error that happens in async middleware by default. And only because some new people came in to finally fix it! It’s absolutely insane how long that took and how easy it was to get stung by that issue.
An invocation of a Nodejs async function is automatically tracked within the code as a locally-scoped promise. The runtime will track it, but unless you then register that Promise elsewhere, it can only be accessed within that local scope. You better hope that you immediately chain it with the success callback or capture errors from it.
Spawning a lightweight process in BEAM returns a first-class primitive called a pid. That pid is recorded by the scheduler, so even if it gets lost by the code, you can still find out if it has been taking up resources (when debugging problems live in production).
Supervisor behavior is written in a way so that any gen_server-behavior-complying processes will be linked. That means any crashes of the spawned process will notify the supervisor. That’s not something we are doing with Nodejs async — there is no mailbox to notify, just either awaiting completion, or make sure you add the error handling … which is where people write linters to check.
The problem with Node is observability. They've optimized away observability to where it's hard to find performance problems compared to the JVM to Beam.
I have been looking for an Erlang thing akin to Apache Airflow or Argo Workflows. Something that allows me to define a DAG of processes, so that they run one after the other. How would you implement something like that?
Adding to this, the primitives erlang, and descendants, give you are very easy to work with, and therefore very easy to test.
Take GenServer. The workhorse of most BEAM systems. Everything it does it basically just calling various functions with simple parameters. So you can test it just by call l calling those functions and manually passing parameters to it, and asserting on its output. No need to set up complex testing systems that are capable of dealing with asynchronous code, no need to handle pauses and wait for coffee to finish running in your tests. It's something a lot of juniors tend to miss, but it's liberating when figured out
C nodes are under appreciated. We have one (Cgo) for communicating between Go and Elixir services running in the same Kubernetes pod. The docs are also pretty good for Erlang and its C libs.
> I know HN has a bit of a click-bait love relationship with Erlang/Elixir but it hasn't translated over to adoption and there are companies that are just burning money trying to do what you get out of the box for free with the Erlang stack.
Elixir is "bad" because it is not a friendly language for people who want to be architecture astronauts at the code level (you can definitely be an architecture astronaut at the process management level but that's a very advanced concept). And a lot of CTOs are architecture astronauts.
That's the opposite of my experience. I tend to get those "architect astronauts" in teams using other languages platforms, and the folks I work with Erlang or Elixir tend to be pragmatic and willing to dig down the stack to troubleshoot problems.
> When you go too far up, abstraction-wise, you run out of oxygen. Sometimes smart thinkers just don’t know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don’t actually mean anything at all.
> These are the people I call Architecture Astronauts. It’s very hard to get them to write code or design programs, because they won’t stop thinking about Architecture. They’re astronauts because they are above the oxygen level, I don’t know how they’re breathing. They tend to work for really big companies that can afford to have lots of unproductive people with really advanced degrees that don’t contribute to the bottom line.
Joel was wrong about one thing, they also work at startups. My roommate worked at a startup where the senior frontend developer was basically building react in svelte + zod. Once a week he would see all his work deleted and completely rewritten in a fever dream PR that the senior produced. Completely impossible for grug developer to follow what's going on, his job eventually became "running this guy's code through chatgpt and adding comments and documentation".
My personal opinion as a fan and adopter of the stack is that the benefit is often seen down the line, with the upfront adoption cost being roughly the same.
E.g. the built in telemetry system is fantastic, but when you are first adopting the stack it still takes a day or two to read the docs and get events flowing into - say - DataDog, which is roughly the same amount of time as basically every other solution.
The benefit of Elixir here is that the telemetry stack is very standardized across Elixir projects and libraries, and there are fewer moving pieces - no extra microservices or docker containers to ship with everything else. But that benefit comes 2 years down the line when you need to change the telemetry system.
These incremental benefits don't translate to an order of magnitude more productivity, or stability, or profitability. Given the choice, as a business owner, future proofing is about being able to draw from the most plentiful and cheapest pool of workers. The sausage all looks the same on the outside.
That is not true, especially with Section 174 (for the US). Right now, if you want to hire an Elixir engineer, you're better off finding a generalist willing to learn and use Elixir, and you would probably get someone who is very capable.
With Section 174 in play in the US, it tends to drive companies hiring specialists and attempting to use AI for the rest of it.
My own experience is that ... I don't really want to draw from the most plentiful and cheapest pool of workers. I've seen the kind of tech that produces. You basically have a small handful of software engineers carrying the rest.
Elixir itself is a kind of secret, unfair advantage for tech startups that uses it.
>you're better off finding a generalist willing to learn and use Elixir, and you would probably get someone who is very capable.
This is a thing I really don't get. People are like "but what about the hiring pool". A competent software engineer will learn your stack. It's not that hard to switch languages. Except maybe going from Python to C++.
I'm biased, because I worked at WhatsApp, but it may be one of the most famous users of Erlang... and from its start until when I left (late 2019) I think we only hired three people with Erlang experience. Everyone else who worked in Erlang learned on the job.
We seemed to do pretty well, although some of our code/setup wasn't very idiomatic (for example, I'm pretty sure we didn't use the Erlang release feature properly at all)
We just pushed code, compiled, and hotloaded... Pretty much ignoring the release files; we had them, but I think the contents weren't correct and we never changed the release numbers, etc.
For otp updates, we would shutdown beam in an orderly fashion, replace the files, and start again. (Potentially installing the new one before shutting down, I can't remember).
Post facebook, more of boring OS packages and slow rollouts than hotloading.
There's no killer app, as in a reason to add it to your tech stack.
The closest I've come across was trying to maintain an ejabberd cluster and add some custom extensions.
Between mnesia and the learning curve of the language itself, it was not fun.
There are also no popular syntax-alikes. There is no massive corporation pushing Erlang either directly or indirectly through success. Supposedly Erlang breeds success but it's referred to as a "secret" weapon because no one big is pushing it.
Erlang seems neat but it feels like you need to take a leap of faith and businesses are risk averse.
Well jayd did the same thing as that small company (which I joined in 2011 when it was small and left in 2019 when it was not so small), run ejabberd to solve a problem. In our case, Erlang subsumed pretty much the rest of our service over time. When I started, chat was Erlang, but status messages, registration, and contacts were PHP with MySQL, media was PHP (with no database), but those all got sucked into Erlang with mnesia because it was better for us.
But I guess it doesn't always work that way. FB chat was built on ejabberd and then migrated away.
Also, a lot of the power of Erlang is the OTP (Open Telecom Platform) even more than Erlang, itself. You have to internalize those architectural decisions (expect crashes--do fast restart) to get the full power of Erlang.
Elixir seems like it has been finding more traction by looking more like mainstream languages. In addition, languages on the BEAM (like Elixir) made the BEAM much better documented, understood and portable.
Anyway, the options seem to be either summoning transcendent threats by superficial syntax or by well entrenched semantics. There seems to be no other choice.
OTP itself has so much in it. We've been working on compiling Elixir to run on iOS devices. Not only can we do that through the release process but through using the ei library provided in Erlang we can compile a Node in C that will interface with any other Erlang node over a typical distributed network as you would for Erlang, Elixir, Gleam, etc... furthermore there is a rpc library in Erlang where from C we can make function calls and interface with our Elixir application. Yes, the encoding/decoding has an overhead and FFI would be faster but we're still way within our latency budget and we got this stood up in a few days without even have heard of it before.
The larger point here is that Erlang has been solving many of the problems that modern tech stacks are struggling with and it has solved for scale and implementation cost and it solved these problems decades ago. I know HN has a bit of a click-bait love relationship with Erlang/Elixir but it hasn't translated over to adoption and there are companies that are just burning money trying to do what you get out of the box for free with the Erlang stack.