Bastion – Highly-available distributed fault-tolerant runtime

elteto · on Feb 24, 2020

How is runtime fault-tolerance achieved? My understanding of Erlang is that the BEAM VM implements these capabilities (custom threads, supervision, restarts, hot reload), but it is one level removed and above actual code. And they implement their own user-space threading runtime in order to support them. But in Rust, there is no such runtime (or is Bastion implementing one?) and it seems like this is used as a library. I'm very curious.

I think another way to frame my question would be: which is the basic unit of parallel execution in Bastion? A thread? Or a separate process? There are mentions of lightweight processes and subprocesses in the README but it is rather vague what these are.

zzzcpan · on Feb 24, 2020

By runtime fault-tolerance they probably just mean an ability to do programmable supervisors that can react to actors dying, nothing special. And it's not like you can do a lot from a user space process anyway, apart from catching signals and destroying a currently running actor that caused it.

ignoramous · on Feb 24, 2020

> How is runtime fault-tolerance achieved?

An actor is:

1. A lightproc, which, per my understanding, are async-spawned threads returning (optional?) Futures [0].

2. A ProcHandle [1] that lets you define process-state (like pid), control process-exec (like cancel, suspend?), listen on progress of a given lightproc that is run by BastianExecutors [2], whilst the message passing / supervisor semantics is handled by Bastion [3].

https://akka.io/ on JVM would be a better comparison to this than BEAM's implementation of actors, I think.

[0] https://github.com/bastion-rs/bastion/blob/2d9dc705962f30fbf...

[1] https://github.com/bastion-rs/bastion/blob/2d9dc705962f30fbf...

[2] https://github.com/bastion-rs/bastion/blob/2d9dc705962f30fbf...

[3] https://docs.rs/bastion/0.3.4/bastion/struct.Bastion.html

blattimwind · on Feb 24, 2020

Erlang's use of m:n threading is orthogonal to fault-tolerance (perhaps not inside the implementation, but conceptually).

StreamBright · on Feb 24, 2020

If an Erlang process crash cannot crash the entire system while Bastion's concept of a process can then threading is important part of fault-tolerance, isn't it?

elteto · on Feb 24, 2020

It definitely is not orthogonal. Suppose an OS thread goes into an infinite loop. How do you cleanly stop it (feel free to assume Linux/Windows/MacOS)?. In Erlang this is possible because of the custom threading implementation.

blattimwind · on Feb 24, 2020

In Erlang that's possible because the program runs in a VM. Erlang could do the same with 1:1 and m:1 threading.

zzzcpan · on Feb 24, 2020

It's because of the vm interpreter that calls into the scheduler within loop iterations. Nothing to do with threads.

chrisseaton · on Feb 24, 2020

> How do you cleanly stop it

The JVM manages this - there are many options:

- reading a flag - patching the running - changing memory protection and causing a segfault

saagarjha · on Feb 24, 2020

> Suppose an OS thread goes into an infinite loop. How do you cleanly stop it (feel free to assume Linux/Windows/MacOS)?

ptrace it?

paulsutter · on Feb 24, 2020

Could we hear a little more about the background of the project, including what it's being developed for? Really interested to learn more about the project, this looks great

windor · on Feb 24, 2020

Very appreciate the work on bastion, which really gets the spirit of erlang actor programming with the supervisor-ing strategy! The code is clean and well documented, and I cannot believe the project is not well-known by rust communities.

windor · on Feb 24, 2020

BTW: They are working on Hot-Code Swap.

mkj · on Feb 24, 2020

This looks promising, though the "No Forced Trait Implementations" seems to instead require using a strange looking msg!() macro?

https://docs.rs/bastion/0.3.4/bastion/macro.msg.html

Seems less clean to read than Riker (https://riker.rs), though that doesn't really do async well.

windor · on Feb 24, 2020

Yes, the msg!() macro is a little painful to write. I think it can be refactored into the pattern like `impl Handler<Msg>`. But beyond that, it supports async/await naturally. :)

jokoon · on Feb 24, 2020

I have hard time understanding what this is. Is an alternative to docker somehow? What other framework/platform would bastion compete with?

kitd · on Feb 24, 2020

It provides a distributed actor runtime a la Erlang, but for Rust.

The Getting Started example gives some useful insights:

https://github.com/bastion-rs/bastion/blob/master/bastion/ex...

windor · on Feb 24, 2020

It's a library in rust for actor-model programming like erlang does.

coenhyde · on Feb 24, 2020

So did it. I didn't know if it was a service or a library or if it integrates with something. Looks like it's a library for Rust. I think mentioning Rust would speed up the understanding of where this sits

windor · on Feb 24, 2020

Yes. the title was changed which I wasn't aware of.

Origin title: `The missing part of actor-model programming in rust`.

sheeshkebab · on Feb 24, 2020

It’s more like Nats.io - an async message server, just for rust

ronmex · on Feb 24, 2020

Looks like Akka Cluster for rust?

davidw · on Feb 24, 2020

Looks like good work! I'm curious about why I might use this instead of Erlang.

hopia · on Feb 24, 2020

I was also wondering if this is aiming to be the Erlang for Rust developers, or rather a better Erlang. Either one would probably be worthwhile.

davidw · on Feb 24, 2020

Yeah, either one is pretty cool.

If it's 'Erlang for Rust developers' I'd be curious to get a feel for how well it integrates with everything. A lot of what Erlang does is kind of difficult to shoehorn in via a library, but I don't know Rust well so maybe it all integrates in a very natural way.

lostcolony · on Feb 24, 2020

Out of curiosity, what would you be looking for for "a better Erlang"? Most if not all of my issues were syntactical, or things that were given up as tradeoffs that I can't qualify as "better", so I'm curious what someone else's impressions are here.

hopia · on Feb 24, 2020

I'm honestly no expert on the subject. I have virtually zero experience with Erlang language. And I don't know a thing about Rust. However, if that counts I have read large parts of Joe Armstrong's PhD thesis about distributed systems and written plenty of Elixir.

It may be wrong to say we need a better Erlang, but for modern web development we certainly could use a fair bit more compiler enforced type safety for writing more correct programs. Mostly due to the lack of a proper type system, Erlang is not particularly expressive as a language, although its primitives (processes & message passing) are well geared exactly for what it focuses on.

Erlang focuses strongly on fault-tolerance and subsequently on high availability, but I'm not sure this perk is so significant benefit anymore compared to various other runtimes spinning the wheels of your typical web server in a cloud setup. We rarely need to write servers anymore that keep running for many years straight without a restart, only patching code via hot reloads.

Just my take but since you asked, trading some fault tolerance for more correct programs would therefore probably be a fair trade-off for a new kind of Erlang.

lostcolony · on Feb 24, 2020

Thanks for the reply!

Yeah; I always added type annotations to my code and ran Dialyzer...I wish there was a stricter compilation mode for that. That said, I really liked the 'optimistic' part of it, so I didn't spend a large amount of time correcting my type specs when proper testing proved it work.

I can agree with bfrog's comment too; Rust's typing can definitely help eliminate a bunch of classes of bugs...but honestly I'm not sure I ever ran into them using Erlang in production anyway. Immutable data + message passing tends to make it easy to implement things correctly, whereas some of Rust's more standout features seem designed to solve problems caused by the language itself picking a different set of tradeoffs (i.e., borrow checking as part of what is necessary to manage memory without risking it going out of scope prematurely, or being unable to be reclaimed at a deterministic time by the compiler). I'm not familiar enough with Rust myself to really comment though; I just know that in two and a half years of production Erlang, the only bug we ever encountered was caused by a C driver (and our own bad design in not circuit breaking calls to it, leading to restarts that trickled up the supervisor chain under heavier load than we'd tested for. None of which would Rust have been able to help us with. In fact, even the driver, the issue was an unnecessary network call that sometimes hit an empty cache, causing a huge network hiccup that led to an unhandled timeout). It got 'correctness' as well as 'fault tolerance'. At least as much as a language can (i.e., we could still implement the wrong things, such as when we implemented an O(N^2), but correct, algorithm, that we had to fix to an O(N) when we noticed certain calls being too slow)

bfrog · on Feb 24, 2020

Syntax is fine, to me the weak typing gets old. Yes there's tooling to help annotate and check usage of annotated types but it's not nearly as good as Rust in typing help to make correct programs.

pdimitar · on Feb 25, 2020

- Zero-copy mechanics out of the box (borrow checker helps there) and especially in message passing. A lot of the overhead of actor style systems seems to come from copying data and not so much in the actual message queue reading and dispatching.

- Strong static typing. It's 2020 and we should all stop pretending that it doesn't eliminate class of bugs. It does.

- Goes without saying since we mention Erlang but still -- full async support for both CPU and I/O intensive tasks. If one actor goes 100% CPU in an infinite loop everybody else should be unaffected (as much as the hardware allows for that). And if the compiler itself can detect infinite loops and just yell at you for them, even better! (But that borders on sci-fi at the moment.)

- Again related to the above: full non-voluntary preemption. You can insert your own yield points if you like but the runtime will choose if it will respect them or not.

- Obviously, raw speed. I love Elixir and with time I learned how to make its code minimally intrusive for the CPU -- most of my Elixir code basically glues things together, does a quick processing and gets the hell out of the way. 100% of my Elixir code always waits on DB or network requests and I am very proud of that. Still, I'd love such an amazing concept like the actor style parallelism and all the BEAM's goodies in general to come in a package that's 10x or 100x faster and make full use of modern hardware! SIMD / AVX included.

---

There could be more but these are just off the top of my head.

A much shorter answer would likely be: a BEAM VM with minimal memory and I/O and CPU overhead, fully utilising the hardware, and bringing correctness and less bugs with strong static typing.

lostcolony · on Feb 26, 2020

So short aside - zero copying (locally; it will always be a copy in distributed Erlang) is possible as is with large binaries. But those can (and have) created memory issues themselves (by keeping a large binary around when you're only referencing a small part).

For other types, it's not possible without either adopting Rust's memory model (but that's a huge set of tradeoffs to take on; completely change the language), or giving up the GC benefits of everything being process(stack) allocated, which would entail a whole new set of tradeoffs (and probably not be much faster).

I, personally, don't see that direction as a plus, as such.

Not sure about your third/fourth point; that's Erlang's behavior, no? One actor being busy won't starve others, for CPU or IO. It's pre-emptive; the VM counts reductions and moves on (and not really sure why, given that, you'd want user controlled pre-emption; I never had an issue with one actor starving others, unless calling out to some C code or something, and even that is mitigated with dirty schedulers). Detecting infinite loops could be optimistic, like Dialyzer's typing, I think, but could never be 100% because, yeah, halting problem.

And as above, totally agree on typing, though I personally like Dialyzer's optimistic approach (I just wish type specs were enforced).

But, per the original, I am totally on board with more languages supporting actors...I just was curious what 'a better Erlang' would look like. Some of your bullets seem "a better language (for my use case)", and not really aligned with Erlang's goals.

pdimitar · on Feb 28, 2020

Yep, you're right. I was kind of rambling. :)

As for the zero-copy stuff, I am not educated enough to claim anything. I have just seen some claims by Rust and liked the idea of the borrowing -- but I know data in the stack is always going to be faster than data in the heap.

I am just wondering if we can't stop with this whole "copy your data" thing all the time.

My comment was a rant + a general musing of a guy who lately mostly codes Elixir; the lack of strong typing is a major pain at times, and sometimes I also wonder if the BEAM VM can't be much faster.

That's all. Apologies if my comments were off-topic.

davidw · on Feb 24, 2020

Elixir is pretty nice in that it cleans up some of the cruft from however many years ago.

Making it faster for certain things, as long as that doesn't hurt it in other ways, is always going to be a win.

pdimitar · on Feb 25, 2020

Currently Elixir is pretty much done and can't be made [much?] better. I work with it for 3.5 years and I can't see how it can be improved inside its current environment.

So trying to reach for strong static typing and zero-copy message passing are natural next steps -- and if that's not possible with Elixir itself, we'll search for it in other languages.

At least that's the case for me.

xanth · on Feb 25, 2020

I wonder how this performs in comparison to actix[1] & axiom[2]?

1. https://github.com/actix/actix 2. https://github.com/rsimmonsjr/axiom

dana321 · on Feb 24, 2020

Runtime for what? Does it only run rust code?

pronoiac · on Feb 25, 2020

Odd name - bastion hosts, aka jumpboxes or homeboxes, are also the access points that bridge different security zones, like internet to a secure VPC.

spurdoman77 · on Feb 24, 2020

Can someone elaborate use cases for this?

gavinray · on Feb 24, 2020

To provide context, understanding this requires a little bit of background knowledge about concurrency paradigms.

In concurrent programming, there are a few mental models/approaches you can use to achieve it. Each of them have different "values systems" and tradeoffs, if you will.

In a nutshell, you have:

- Locks (Mutex/Semaphore)

- Communicating Sequential Processes

- Software Transactional Memory

- Actor Model

The Actor Model is a particularly powerful paradigm because it isolates processes and works via message passing and spawning. The reason why Erlang/Elixir are fault-tolerant is because of the BEAM's process model, any given process (more or lesss) can fail and it's not a problem due to isolation.

What this library allows you to do is architect applications in ways such that they are much more resilient to failure and easier to scale out + parallelize/distribute.

It doesn't have to be a networked application either, any code process can be an actor. It applies to any software.

If you want a great overview of the Actor model, there are some slides here which do a fantastic job of illustrating it:

https://cs.nyu.edu/wies/teaching/ppc-14/material/lecture10.p...

michael_j_ward · on Feb 24, 2020

Do you have any good resources in learning more about these models / approaches?

macintux · on Feb 25, 2020

I’ve only skimmed this, but it seemed reasonable.

https://pragprog.com/book/pb7con/seven-concurrency-models-in...

sbarre · on Feb 24, 2020

There's a whole section in the repo for examples and use-cases

https://github.com/bastion-rs/bastion/tree/master/bastion/ex...

hopia · on Feb 24, 2020

Not knowing anything about Rust, I would imagine similar as those of Erlang's. Basically when you need servers than communicate with each other.