How is runtime fault-tolerance achieved? My understanding of Erlang is that the BEAM VM implements these capabilities (custom threads, supervision, restarts, hot reload), but it is one level removed and above actual code. And they implement their own user-space threading runtime in order to support them. But in Rust, there is no such runtime (or is Bastion implementing one?) and it seems like this is used as a library. I'm very curious.
I think another way to frame my question would be: which is the basic unit of parallel execution in Bastion? A thread? Or a separate process? There are mentions of lightweight processes and subprocesses in the README but it is rather vague what these are.
By runtime fault-tolerance they probably just mean an ability to do programmable supervisors that can react to actors dying, nothing special. And it's not like you can do a lot from a user space process anyway, apart from catching signals and destroying a currently running actor that caused it.
1. A lightproc, which, per my understanding, are async-spawned threads returning (optional?) Futures [0].
2. A ProcHandle [1] that lets you define process-state (like pid), control process-exec (like cancel, suspend?), listen on progress of a given lightproc that is run by BastianExecutors [2], whilst the message passing / supervisor semantics is handled by Bastion [3].
https://akka.io/ on JVM would be a better comparison to this than BEAM's implementation of actors, I think.
If an Erlang process crash cannot crash the entire system while Bastion's concept of a process can then threading is important part of fault-tolerance, isn't it?
It definitely is not orthogonal. Suppose an OS thread goes into an infinite loop. How do you cleanly stop it (feel free to assume Linux/Windows/MacOS)?. In Erlang this is possible because of the custom threading implementation.
Could we hear a little more about the background of the project, including what it's being developed for? Really interested to learn more about the project, this looks great
Very appreciate the work on bastion, which really gets the spirit of erlang actor programming with the supervisor-ing strategy!
The code is clean and well documented, and I cannot believe the project is not well-known by rust communities.
Yes, the msg!() macro is a little painful to write.
I think it can be refactored into the pattern like `impl Handler<Msg>`.
But beyond that, it supports async/await naturally. :)
So did it. I didn't know if it was a service or a library or if it integrates with something. Looks like it's a library for Rust. I think mentioning Rust would speed up the understanding of where this sits
If it's 'Erlang for Rust developers' I'd be curious to get a feel for how well it integrates with everything. A lot of what Erlang does is kind of difficult to shoehorn in via a library, but I don't know Rust well so maybe it all integrates in a very natural way.
Out of curiosity, what would you be looking for for "a better Erlang"? Most if not all of my issues were syntactical, or things that were given up as tradeoffs that I can't qualify as "better", so I'm curious what someone else's impressions are here.
I'm honestly no expert on the subject. I have virtually zero experience with Erlang language. And I don't know a thing about Rust. However, if that counts I have read large parts of Joe Armstrong's PhD thesis about distributed systems and written plenty of Elixir.
It may be wrong to say we need a better Erlang, but for modern web development we certainly could use a fair bit more compiler enforced type safety for writing more correct programs. Mostly due to the lack of a proper type system, Erlang is not particularly expressive as a language, although its primitives (processes & message passing) are well geared exactly for what it focuses on.
Erlang focuses strongly on fault-tolerance and subsequently on high availability, but I'm not sure this perk is so significant benefit anymore compared to various other runtimes spinning the wheels of your typical web server in a cloud setup. We rarely need to write servers anymore that keep running for many years straight without a restart, only patching code via hot reloads.
Just my take but since you asked, trading some fault tolerance for more correct programs would therefore probably be a fair trade-off for a new kind of Erlang.
Yeah; I always added type annotations to my code and ran Dialyzer...I wish there was a stricter compilation mode for that. That said, I really liked the 'optimistic' part of it, so I didn't spend a large amount of time correcting my type specs when proper testing proved it work.
I can agree with bfrog's comment too; Rust's typing can definitely help eliminate a bunch of classes of bugs...but honestly I'm not sure I ever ran into them using Erlang in production anyway. Immutable data + message passing tends to make it easy to implement things correctly, whereas some of Rust's more standout features seem designed to solve problems caused by the language itself picking a different set of tradeoffs (i.e., borrow checking as part of what is necessary to manage memory without risking it going out of scope prematurely, or being unable to be reclaimed at a deterministic time by the compiler). I'm not familiar enough with Rust myself to really comment though; I just know that in two and a half years of production Erlang, the only bug we ever encountered was caused by a C driver (and our own bad design in not circuit breaking calls to it, leading to restarts that trickled up the supervisor chain under heavier load than we'd tested for. None of which would Rust have been able to help us with. In fact, even the driver, the issue was an unnecessary network call that sometimes hit an empty cache, causing a huge network hiccup that led to an unhandled timeout). It got 'correctness' as well as 'fault tolerance'. At least as much as a language can (i.e., we could still implement the wrong things, such as when we implemented an O(N^2), but correct, algorithm, that we had to fix to an O(N) when we noticed certain calls being too slow)
Syntax is fine, to me the weak typing gets old. Yes there's tooling to help annotate and check usage of annotated types but it's not nearly as good as Rust in typing help to make correct programs.
- Zero-copy mechanics out of the box (borrow checker helps there) and especially in message passing. A lot of the overhead of actor style systems seems to come from copying data and not so much in the actual message queue reading and dispatching.
- Strong static typing. It's 2020 and we should all stop pretending that it doesn't eliminate class of bugs. It does.
- Goes without saying since we mention Erlang but still -- full async support for both CPU and I/O intensive tasks. If one actor goes 100% CPU in an infinite loop everybody else should be unaffected (as much as the hardware allows for that). And if the compiler itself can detect infinite loops and just yell at you for them, even better! (But that borders on sci-fi at the moment.)
- Again related to the above: full non-voluntary preemption. You can insert your own yield points if you like but the runtime will choose if it will respect them or not.
- Obviously, raw speed. I love Elixir and with time I learned how to make its code minimally intrusive for the CPU -- most of my Elixir code basically glues things together, does a quick processing and gets the hell out of the way. 100% of my Elixir code always waits on DB or network requests and I am very proud of that. Still, I'd love such an amazing concept like the actor style parallelism and all the BEAM's goodies in general to come in a package that's 10x or 100x faster and make full use of modern hardware! SIMD / AVX included.
---
There could be more but these are just off the top of my head.
A much shorter answer would likely be: a BEAM VM with minimal memory and I/O and CPU overhead, fully utilising the hardware, and bringing correctness and less bugs with strong static typing.
So short aside - zero copying (locally; it will always be a copy in distributed Erlang) is possible as is with large binaries. But those can (and have) created memory issues themselves (by keeping a large binary around when you're only referencing a small part).
For other types, it's not possible without either adopting Rust's memory model (but that's a huge set of tradeoffs to take on; completely change the language), or giving up the GC benefits of everything being process(stack) allocated, which would entail a whole new set of tradeoffs (and probably not be much faster).
I, personally, don't see that direction as a plus, as such.
Not sure about your third/fourth point; that's Erlang's behavior, no? One actor being busy won't starve others, for CPU or IO. It's pre-emptive; the VM counts reductions and moves on (and not really sure why, given that, you'd want user controlled pre-emption; I never had an issue with one actor starving others, unless calling out to some C code or something, and even that is mitigated with dirty schedulers). Detecting infinite loops could be optimistic, like Dialyzer's typing, I think, but could never be 100% because, yeah, halting problem.
And as above, totally agree on typing, though I personally like Dialyzer's optimistic approach (I just wish type specs were enforced).
But, per the original, I am totally on board with more languages supporting actors...I just was curious what 'a better Erlang' would look like. Some of your bullets seem "a better language (for my use case)", and not really aligned with Erlang's goals.
As for the zero-copy stuff, I am not educated enough to claim anything. I have just seen some claims by Rust and liked the idea of the borrowing -- but I know data in the stack is always going to be faster than data in the heap.
I am just wondering if we can't stop with this whole "copy your data" thing all the time.
My comment was a rant + a general musing of a guy who lately mostly codes Elixir; the lack of strong typing is a major pain at times, and sometimes I also wonder if the BEAM VM can't be much faster.
That's all. Apologies if my comments were off-topic.
Currently Elixir is pretty much done and can't be made [much?] better. I work with it for 3.5 years and I can't see how it can be improved inside its current environment.
So trying to reach for strong static typing and zero-copy message passing are natural next steps -- and if that's not possible with Elixir itself, we'll search for it in other languages.
To provide context, understanding this requires a little bit of background knowledge about concurrency paradigms.
In concurrent programming, there are a few mental models/approaches you can use to achieve it. Each of them have different "values systems" and tradeoffs, if you will.
In a nutshell, you have:
- Locks (Mutex/Semaphore)
- Communicating Sequential Processes
- Software Transactional Memory
- Actor Model
The Actor Model is a particularly powerful paradigm because it isolates processes and works via message passing and spawning. The reason why Erlang/Elixir are fault-tolerant is because of the BEAM's process model, any given process (more or lesss) can fail and it's not a problem due to isolation.
What this library allows you to do is architect applications in ways such that they are much more resilient to failure and easier to scale out + parallelize/distribute.
It doesn't have to be a networked application either, any code process can be an actor. It applies to any software.
If you want a great overview of the Actor model, there are some slides here which do a fantastic job of illustrating it:
I think another way to frame my question would be: which is the basic unit of parallel execution in Bastion? A thread? Or a separate process? There are mentions of lightweight processes and subprocesses in the README but it is rather vague what these are.