It’s not even remotely similar. Node’s cluster is just bog-standard OS subprocesses running their own event loop.
To spread work over multiple cores with the cluster (or even worker-threads) modules you have to do so explicitely and manually. It’s essentially the same model you get with pthreads, or java, or python.
BEAM is a completely different model, the “processes” are internal tasks, which the runtime will schedule appropriately over available cores (the scheduler has been multithreaded for 15 years or so), spawning processes is as common as spawning tasks in JS, except each of these is scheduled on an SMP runtime.
beam will literally send processes between machines erlang) more easily than node will balance load over cores.
At the Abstractions conference in Pittsburgh in 2016, Joe Armstrong was hanging out in the hallway with his swag bag, just a regular engineer complaining about Jira and his manager (he had no interest in managing) and asking people's opinion about the schedule and lunch places. We were looking at the program for the next sessions and someone said there's a talk about ideas for adding concurrency features to Node, and we said great, lets go, and a few of us went to go stand in the back of that one.
On a number of points the presenter was proposing, like message passing, immutable structures, and process tree management, the presenter would say, "but Erlang's had this feature for many years..." and the room would laugh and turn around and acknowledge Joe. He was modest but the validation must have been nice.
I'm unfamiliar with BEAM. How does this compared to goroutines? Obviously they won't migrate between machines, but concurrency feels very easy and ergonomic in Go.
Go routines were pretty directly inspired by Erlang processes, so in terms of primitives I'd say they are very similar, aside from go lacking the distributed features you already mentioned.
Where Erlang/Elixir add value beyond go routines is what OTP (kind of the standard library) provides on top. Pre-built abstractions like GenServer for long running processes, Genstage for producer/consumer pipelines and supervision trees for making sure your processes haven't crashed and restarting them and their dependents if they have.
At the most basic level it's a bit similar: there is a multithreaded runtime which schedules work in userland. Green threads if you will.
The devil, however, is once you go beyond the trivial.
First, the units of work operate completely differently, BEAM follows the actor model rather than CSP, meaning every actor has an address / mailbox and the actors can send one another messages through this, any actor can send any other actor (they're aware of) messages, and actors can process their mailboxes however they want.
But BEAM is also completely strict and steadfast about its actor model: its actors are processes, each has its own stack but also its own heap, when one actor sends a message to an other, the message content gets copied (/ moved) from one stack or heap to an other, processes do not share memory[0]. Incidentally this is what makes distribution relatively transparent (not entirely so, but impressively still): if everything you do out to interact with others is send asynchronous one-way messages, it doesn't really matter whether they're in the same process (OS), in a different process (OS), or on a different machine entirely.
The reason BEAM works like that however is not for any sort of theoretical purity, instead it is in service to reliability, which is the second big difference between BEAM and Go: BEAM error handling is inter-process, not intra-process. BEAM's error handling philosophy is that processes encounter errors, for all sort of reasons, and when that happens you can't know the entire state of the process, so you just kill it[1] and it should have one of its buddies which is linked to it, and whose job is to handle this and orchestrate what happens.
BEAM has built-in support for linking and monitoring. In the context of erlang, linking means that if one process dies (crashes), the other is sent a special message which also kills it. This message can be received as a normal message instead, in order to handle the crash of your sibling (in which case you receive various metadata on the crash). Monitoring means you just want to receive the crash signal. The reason you might prefer linking to monitoring is that if you're a manager of other processes and you crash, you probably want all the processes you manage to die as well. Which doesn't happen with monitors.
That is because BEAM has its origins in telecommunications, where reliability means redundancy, and oversight. So the way you structure an application in beam (often) is a tree of processes, where many of the processes have oversight of a subtree, handle fuckups (maybe by restarting, maybe by something else), serve as entry point to their workers, etc..., and if one of the leaves dies that's just a signal sent to its parent, which might just die and signal its parent, which will handle it somehow. This is the design principle known as the supervision tree: https://www.erlang.org/doc/design_principles/des_princ#super...
The third big difference is more philosophical and has to do with code reuse: because of (2) above, a lot of erlang / beam / otp is communicating between processes in a subtree, moving messages between them, exit signal strategies, etc... which leads to behaviours (https://www.erlang.org/doc/design_principles/des_princ#behav...), which are pretty alien because they're more or less mini frameworks, which not only are two things which are usually put opposite one another, but many people don't really want to hear about frameworks.
But that's what they are: behaviours are the encoding of entire prototypal lifecycle and communication patterns, where the user / implementer of the behaviour fills in the "business" bits.
Oh yeah and beam comes with an entire suite of introspectability tooling, which is kinda linked to (2): all the oversight thing ends up at people, so you can connect to a runtime and look at and touch all the things, more or less.
BEAM is a bit of an OS running on an OS really, probably closer in philosophy to the image-based languages of the 80s. In part because it is a language from the 80s. Not quite image based though, or in an other way designed to go even further and just run forever, as it includes built-in support for hot code reloading and migrations (though from what I remember that's not super great or fun, it was quite messy and involved to actually do properly).
By comparison to all that, goroutines are just threads which happen to be cheap so you can have lots.
[0] kinda, some objects live on shared heaps as an optimisation but they're immutable and reference counted so it's an implementation detail.
[1] and here if actors share any memory, an actor might be dying in the middle of updating or holding onto shared state, which means its error corrupts other actors touching the same state
Async/await and any module won't save you from global state, data races, and the fact that you're running on an imperative language with mutable state. Additionally, the ergonomics are not the same, so even if you could replicate the BEAM in Node or any other language, you'd have to be a masochist to do it.
Lastly, the concurrency are primitives to the entire runtime, not a set of external libraries maintained by whoever, which might be incompatible with other libraries you might want to use.
I think Node is still single core by default? Elixir (or rather the Beam) will handle core utilisation so if you start a load of Elixir processes they’ll be spread across multiple cores.
Node itself has never been single-threaded. The execution model for JavaScript is single-threaded, so there’s no working around that, but libuv uses threads to build async IO on top of blocking operations.
Then there’s worker threads, which are pretty similar to web workers AIUI, that give you parallel execution for cpu-intensive work.
Obviously, though, none of these facilities compare with BEAM
It doesn't effectively do that. It does about 10% of that, ineffectively.
Instead of running a single VM with full knowledge of how to run lightweight processes, designed to fully take advantage of modern multi-core CPUs with multiple guarantees enforced by the runtime you have multiple single-threaded VMs awkwardly communicating with each other over a bolted-on API
Yeah but then you have to handle a lot of the synchronization of memory. It is hard to make you realise what is possible on the erlang vm without having tried it.
In particular, it is preemptive. This... Makes a lot of stuff easier.
NodeJS was designed to run single-threaded. Sure, you can use cluster module to run it with multiple but there's memory overhead and the ergonomics of sharing state and message-passing is nowhere near GenServer. Not to mention all the other benefits of BEAM.
But about the specifics of implementing a web-crawler, a NodeJS way to implement it would be to parallelize using lambdas.