More

wh33zle · 2024-08-15T06:11:39.000000Z

Depends. MASQUE implies QUIC and a lot of coporate networks still block QUIC, forcing a fallback to TCP (for web browsers).

Not sure how they want to address this in case of WARP?

Nevertheless, MASQUE is some pretty exciting development in the network space.

wh33zle · 2024-08-09T12:27:48.000000Z

I find myself defining my own traits very rarely these days. Traits enable polymorphic actions on data. Most of the time, you simply don't need that.

An enum often suffices to express the different combinations.

Of course, from a pattern perspective, that is "not extensible" but it often ends up being easier to maintain and in Rust, often compiles down to more efficient code.

wh33zle · 2024-07-30T07:10:51.000000Z

WASM shouldn't care about which language the host is written in vs what the apps are written in.

Writing the host in C seems like a bad choice though. Isn't that exactly the kind of software you want to ensure is memory-safe?

wh33zle · 2024-07-08T04:18:58.000000Z

This isn't an adequat comparison in my eyes. Your example moves a very critical component out if the "protocol logic": message parsing.

Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.

In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end.

Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.

EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines.

joshka · 2024-07-10T16:12:57.000000Z

> This isn't an adequat comparison in my eyes. Your example moves a very critical component out if the "protocol logic": message parsing.

(for context in case you didn't see the link in the other comments - https://gist.github.com/joshka/af299be87dbd1f64060e47227b577... is the full working implementation of this). It uses the tokio framed codec approach where message parsing effectively only succeeds once the full message is parsed. For stun that doesn't really seem like a big deal, but I can imagine that for other protos it might be.

> Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.

The stream is a stream of Result<ParsedMessage, ErrorMessage>, that error message is able to represent the invalid message in whatever fashion makes sense for the protocol. The framed approach to this suggests that message framing is more important to handle as a more simple effectively linear state machine of bytes. Nothing is stopping you from treating the smallest unit of the protocol as being a byte, and handling that in the protocol handler though, still just comes down to being a stream transformation from the bytes to a stream of messages or errors..

Additionally, when working with the stream approach, there's usually a way to mutate the codec (e.g. UdpFramed::codec_mut()) which allows protocol level information to inform the codec how to treat the network byte stream if necessary.

> In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end.

The Bytes crate handles zero-copy part of this orthogonally. Lifetimes aren't necessary. PoC on this is to just add a bytes field and push the ref to the bytes representing the stun response like:

    let bytes = data.split_to(12).freeze();
    Ok(Some(BindingResponse { address, bytes }))

> Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.

I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?

> EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines.

I wonder if you'd be able to expand on an example where these constraints you mentioned above come up. The Stun example is simple enough that it is achievable just with an async state machine. I'd be curious to work through a more detailed example (mostly for interests sake).

In the article you mention the runtime overhead of mutexes / channels. I'm curious what sort of percentage of the full system time this occupies.

wh33zle · 2024-07-14T06:08:15.000000Z

> The stream is a stream of Result<ParsedMessage, ErrorMessage>

Fair point! Yeah that would allow error handling the state machine.

> > Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.

> I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?

Yeah sure! If you are up for it, you can dig into the actual code here: https://github.com/firezone/firezone/tree/main/rust/connlib/...

The requirements are:

    1. Run N clients of the TURN protocol.
    2. Run M connections (ICE agent + WireGuard tunnel).
    3. Allow each connection to use each TURN client.
    4. Everything must run over a single UDP socket (otherwise hole-punching doesn't work).

In theory, (3) could be relaxed a bit to not be a NxM matrix. It doesn't really change much though because one TURN allocation (i.e. remote address) needs to be usable by multiple connections.

All of the above is managed as a single `Node` that is composed into a larger state machine that manages ACLs and uses this library to route IP packets to and from a TUN device. That is where the concurrent access comes in: We receive ACL updates via a WebSocket and change the routing logic as a result. Concurrently, we need to read from the TUN device and work out which WireGuard tunnel the packets needs to go through. A WireGuard tunnel may use one of the TURN clients to relay the packet in case a direct connection isn't possible. Lastly, we also need to read from the UDP socket, index into the correct connection based on the IP and decrypt incoming IP packets.

In summary, there are lots of state updates happening from multiple events and the state cannot be isolated into a single task where `&mut` would be an option. Instead, we need `&mut` access to pretty much everything upon each UDP packet from the network or IP packet from the TUN device.

> The Stun example is simple enough that it is achievable just with an async state machine.

I am fully aware of that. It needed to be simple to fit into the scope of a blog post that can be understood by people without a deep networking background. To explain sans-IO, I deliberately wanted a simple example so I could focus on sans-IO itself and not explaining the problem domain. With the basics in place, people can work out by themselves whether or not it is a good fit for their problem. Chances are, if they are struggling with lots of shared state between tasks, it could be!

joshka · 2024-07-14T23:49:45.000000Z

Got it - thanks for the update. I'll take a run at it sometime - if only to understand where the pain points in doing this are more thoroughly.

wh33zle · 2024-07-15T05:41:53.000000Z

Thinking about your example again made me think of another requirement: Multiplexing.

STUN ties requests and responses together with a transaction ID that is encoded as an attribute in the message.

Modelling this using `Sink`s and `Stream`s is quite difficult. Sending out the request via a `Sink` is easy but how do you correctly dispatch the incoming response again?

For example, let's say you are running 3 TURN clients concurrently in 3 async tasks that use the referenced `Sink` + `Stream` abstractions. How do you go from reading from a single UDP socket to 3 streams that each contain the correct responses based on the transaction ID that was sent out? To read the transaction ID, you'd have to parse the message. We've established that we can do that partially outside and only send the `Result` in. But in addition to the 3 tasks, we'd need an orchestrating task that keeps a temporary state of sent transaction IDs (received via one of the `Sink`s) and then picks the correct `Stream` for the response.

It is doable but creates the kind of spaghetti code of channels where concerns are implemented in many different parts of the code. It also makes these things harder to test because I now need to wire up this orchestrating task with the TURN clients themselves to ensure this response mapping works correctly.

joshka · 2024-07-15T23:21:42.000000Z

The way I look at async in this case is that it kind of does the opposite of the inversion of control (in the original sense[1], not the Java/C# IoC container sense) that you're doing here. Your approach replaces a linear method with a driver that calls into a struct that explodes out each line of the method. An async method, basically does the same, but it means your logic doesn't have to handle the polling, orchestration, selection, ordering, etc. manually. It doesn't necessarily mean however you have to throw away your full state machine / structs and implement everything in a single method.

I don't think UdpFramed would fully work in your example, as it only works in terms of a single destination address for sending. I'd write a similar FramedMultiplexer implementation that combines a stream and a sink. I think this is probably what Node[1] already is effectively.

The stream is of either bytes / packets, or properly decoded messages (e.g. `enum Message { WireguardMessage, TurnMessage, StunMessage, IceMessage }`)

The sink is a of a tuple `(SocketAddr, Message)` (i.e. effectively your Transmit struct), and which is hooked up externally to a UdpSocket. It basically aggregates the output of the all the allocation / agent / connections.

The *_try_handle stuff doesn't really change much - it's still a lookup in a map to choose which allocation / agent / connection to write to.

> How do you go from reading from a single UDP socket to 3 streams that each contain the correct responses based on the transaction ID that was sent out

I think right now you're sending the response to the allocation that matches the server address[3] and then checking whether that allocation has the transaction id stored[4]. I guess you miss the ability to return false if the input to the allocation is a stream rather than just a method, but that doesn't preclude writing a function that's effectively the transaction check part of allocation, while still having your allocation inner loop just be read from stream, write to sink. (if transaction is one we sent, send the message to the allocation's task and let it handle it otherwise return false

> It is doable but creates the kind of spaghetti code of channels where concerns are implemented in many different parts of the code. It also makes these things harder to test because I now need to wire up this orchestrating task with the TURN clients themselves to ensure this response mapping works correctly.

I read through the node code and it feels a bit spaghetti to me as an outsider, because of the sans-io abstractions, not inspite of it. That comes from nature of the driving code being external to the code implementing the parts.

The counter point to the testing is that you're already doing this orchestration and need to test it. I'm not sure why having any of this as async changes that. The testing should be effectively "If a turn message comes in, then the right turn client sees the message" - that's a test you'd need to write regardless right?

[1]: https://en.wikipedia.org/wiki/Inversion_of_control (Incidentally, there's a line in the article where you call out the dependency inversion principle, but this is a bit more accurate a description of what's happening - they are related however).

[2]: https://github.com/firezone/firezone/blob/main/rust/connlib/...

[3]: https://github.com/firezone/firezone/blob/92a2a7852bad363e24...

[4]: https://github.com/firezone/firezone/blob/92a2a7852bad363e24...

wh33zle · 2024-07-17T06:30:53.000000Z

> An async method, basically does the same, but it means your logic doesn't have to handle the polling, orchestration, selection, ordering, etc.

In order to take advantage of that, I'd need to create one `async` method for each handshake with the TURN server, right? Like every `allocate`, every `refresh`, every `bind_channel` would need to be its own async function that takes in a `Sink` and `Stream` in order to get the benefit of having the Rust compiler generate the suspense-point for me upon waiting on the response. Who is driving these futures? Would you expect these to be be spawned into a runtime or would you use structured concurrency a la `FuturesUnordered`?

A single `Allocation` sends multiple of these requests and in the case of `bind_channel` concurrently. How would I initialise them? As far as I understand, I would need to be able to "split" an existing `Stream` + `Sink` pair by inserting another multiplexer in order to get a `Stream` that will only return responses for the messages that have been sent via its `Sink`, right?

I just don't see the appeal of all this infrastructure code to "gain" the code-gen that the Rust compiler can do for these request-response protocols. It took my a while to wrap my head around how `Node` should work but the state isn't that difficult to manage because it is all just request-response protocols. If the protocols would have more steps (i.e. more `.await` points in an async style), then it might get cumbersome and we'd need to think about an alternative.

> > It is doable but creates the kind of spaghetti code of channels where concerns are implemented in many different parts of the code. It also makes these things harder to test because I now need to wire up this orchestrating task with the TURN clients themselves to ensure this response mapping works correctly.

> I read through the node code and it feels a bit spaghetti to me as an outsider, because of the sans-io abstractions, not inspite of it. > That comes from nature of the driving code being external to the code implementing the parts.

It is interesting you say that. I've found that I never need to think about the driving code because it is infrastructure that is handled on a completely different layer. Instead, I can focus on the logic I want to implement.

The aspect I really like about the structure is that the call-stack in the source code corresponds to the data flow at runtime. That means I can mentally trace a packet via "jump to definition" in my editor. To me, that is the opposite of spaghetti code. Channels and tasks obfuscate the data flow and you end up jumping between definition site of a channel, tracing where it was initialised, followed by navigating to where the other end is handled etc.

wh33zle · 2024-07-04T11:40:18.000000Z

The line between applications and libraries is fairly blurry, isn't it? In my experience, most applications grow to the point where you have internal libraries or could at least split out one or more crates.

I would go as far as saying that whatever functionality your application provides, there is a core that can be modelled without depending on IO primitives.

binary132 · 2024-07-04T17:43:48.000000Z

In my eyes an ideal library should not contain state, internally allocate (unless very obviously), or manage processes. The application should do that, or provide primitives for doing it which the library can make use of. That makes applications and libraries very very different in my mind.

K0nserv · 2024-07-04T19:21:49.000000Z

The thing about state is a good point. With the sans-IO pattern we have inversion of IO and Time, but adding memory to that would be a nice improvement too.

binary132 · 2024-07-04T20:41:10.000000Z

Those C libraries that have initializers which take ** and do the allocation for you drive me nuts! I’m sure there’s some good reason, but can’t you trust me to allocate for myself, you know?

K0nserv · 2024-07-04T12:43:28.000000Z

Yes true, the one difference might be that you don't expect other consumers with a different approach to IO to use your internal libraries, although it does help you if you want to change that in the future and the testability is still useful

wh33zle · 2024-07-04T11:32:15.000000Z

> I would have been more interested in seeing how they could implement an encapsulated function in the sans-IO style that had to do something like wait on an action or a timer

The "encapsulated function" is the `StunBinding` struct. It represents the functionality of a STUN binding. It isn't a single function you can just call, instead it requires an eventloop.

The point though is, that `StunBinding` could live in a library and you would be able to use it in your application by composing it into your program's state machine (assuming you are also structuring it in a sans-IO style).

The linked `snownet` library does exactly this. Its domain is to combine ICE + WireGuard (without doing IO) which is then used by the `connlib` library that composes ACLs on top of it.

Does that make sense?

EDIT: There is no busy-waiting. Instead, `StunBinding` has a function that exposes, what it is waiting for using `poll_timeout`. How the caller (i.e. eventloop) makes that happen is up to them. The appropriate action will happen once `handle_timeout` gets called with the corresponding `Instant`.

hardwaresofton · 2024-07-04T11:45:23.000000Z

> The "encapsulated function" is the `StunBinding` struct. It represents the functionality of a STUN binding. It isn't a single function you can just call, instead it requires an eventloop. > > The point though is, that `StunBinding` could live in a library and you would be able to use it in your application by composing it into your program's state machine (assuming you are also structuring it in a sans-IO style).

What I was thinking was that the functionality being executed in main could just as easily be in a library function -- that's what I meant by encapsulated function, maybe I should have said "encapsulated functionality".

If the thing I want to do is the incredibly common read or timeout pattern, how do I do that in a sans-IO way? This is why I was quite surprised to see the inclusion of tokio::select -- that's not very sans IO, but is absolutely the domain of a random library function that you might want to expose.

It's a bit jarring to introduce the concept as not requiring choices like async vs not, then immediately require the use of async in the event loop (required to drive the state machine to completion).

Or is the point that the event loop should be async? That's a reasonable expectation for event loops that are I/O bound -- it's the whole point of a event loop/reactor pattern. Maybe I'm missing some example where you show an event loop that is not async, to show that you can drive this no matter whether you want or don't want async?

So if I to try to condense & rephrase:

If I want to write a function that listens or times out in sans-IO style, should I use tokio::select? If so, where is the async runtime coming from, and how will the caller of the function be able to avoid caring?

wh33zle · 2024-07-04T12:10:22.000000Z

> If I want to write a function that listens or times out in sans-IO style, should I use tokio::select? If so, where is the async runtime coming from, and how will the caller of the function be able to avoid caring?

To "time-out" in sans-IO style means that your state machine has an `Instant` internally and, once called at a specific point in the future, compares the provided `now` parameter with the internal timeout and changes its state accordingly. See [0] for an example.

> but is absolutely the domain of a random library function that you might want to expose.

That entire `main` function is _not_ what you would expose as a library. The event loop should always live as high up in the stack as possible, thereby deferring the use of blocking or non-blocking IO and allowing composition with other sans-IO components.

You can absolutely write an event loop without async. You can set the read-timeout of the socket to the value of `poll_timeout() - Instant::now` and call `handle_timeout` in case your `UdpSocket::recv` call errors with a timeout. str0m has an example [1] like that in their repository.

> It's a bit jarring to introduce the concept as not requiring choices like async vs not, then immediately require the use of async in the event loop (required to drive the state machine to completion).

All the event loops you see in the post are solely there to ensure we have a working program but are otherwise irrelevant, esp. implementation details like using `tokio::select` and the like. Perhaps I should have made that clearer.

[0]: https://github.com/firezone/firezone/blob/1e7d3a40d213c9524a... [1]: https://github.com/algesten/str0m/blob/5b100e8a675cd8838cdd8...

hardwaresofton · 2024-07-04T12:51:03.000000Z

> To "time-out" in sans-IO style means that your state machine has an `Instant` internally and, once called at a specific point in the future, compares the provided `now` parameter with the internal timeout and changes its state accordingly. See [0] for an example.

This part of the post was clear -- I didn't ask any clarifications about that, my point was about what I see as "read or timeout", a reasonable functionality to expose as a external facing function.

The question is still "If I want to read or timeout, from inside a function I expose in a library that uses sans-IO style, how do I do that?".

It seems like the answer is "if you want to accomplish read or timeout at the library function level, you either busy wait or pull in an async runtime, but whatever calls your state machine has to take care of that at a higher level".

You see how this doesn't really work for me? Now I have to decide if my read_or_timeout() function exposed is either the default sync (and I have to figure out how long to wait, etc), or async.

It seems in sans-IO style read_or_timeout() would be sync, and do the necessary synchronous waiting internally, without the benefit of being able to run other tasks from unrelated state machines in the meantime.

> That entire `main` function is _not_ what you would expose as a library.

Disagree -- it's entirely reasonable to expose "read your public IP via STUN" as a library function. I think we can agree to disagree here.

> The event loop should always live as high up in the stack as possible, thereby deferring the use of blocking or non-blocking IO and allowing composition with other sans-IO components.

Sure... but that means the code you showed me should never be made into a library (we can agree to disagree there), and I think it's reasonable functionality for a library...

What am I missing here? From unrelated code, I want to call `get_ip_via_stun_or_timeout(hostnames: &[String], timeout: Duration) -> Option<String>`, is what I'm missing that I need to wrap this state machine in another to pass it up to the level above? That I need to essentially move the who-must-implement-the-event-loop one level up?

> You can absolutely write an event loop without async. You can set the read-timeout of the socket to the value of `poll_timeout() - Instant::now` and call `handle_timeout` in case your `UdpSocket::recv` call errors with a timeout. str0m has an example [1] like that in their repository.

Didn't say you couldn't!

What you've described is looping with a operation-supported timeout, which requires timeout integration at the function call level below you to return control. I get that this is a potential solution (I mentioned it in my edits on the first comment), but not mentioning it in the article was surprising to me.

The code I was expecting to find in that example is like the bit in strom:

https://github.com/algesten/str0m/blob/5b100e8a675cd8838cdd8...

Clearly (IMO evidenced by the article using this method), the most ergonomic way to do that is with a tokio::select, and that's what I would reach for as well -- but I thought a major point was to do it sans IO (where "IO" here basically means "async runtime").

Want to note again, this is not to do with the state machine (it's clear how you would use a passed in Instant to short circuit), but more about the implications of abstracting the use of the state machine.

> All the event loops you see in the post are solely there to ensure we have a working program but are otherwise irrelevant, esp. implementation details like using `tokio::select` and the like. Perhaps I should have made that clearer.

I personally think it exposes a downside of this method -- while I'm not a fan of simply opting in to either async (and whichever runtime smol/tokio/async-std/etc) or sync, what it seems like this pattern will force me to:

- Write all code as sync - Write sync code that does waiting based on operations that yielding back control early - Hold my own tokio runtime so I can do concurrent things (this, you argue against)

Async certainly can be hard to use and have many footguns, but this approach is certainly not free either.

At this point if I think I want to write a library that supports both sync and async use cases it feels like feature flags & separate implementations might produce an easier to understand outcome for me -- the sync version can even start as mostly `tokio::Runtime::block_on`s, and graduate to a more performant version with better custom-tailored efficiency (i.e. busy waiting).

Of course, I'm not disparaging the type state pattern here/using state machines -- just that I'd probably just use that from inside an async/sync-gated modules (and be able to share that code between two impls).

wh33zle · 2024-07-04T13:33:52.000000Z

> What am I missing here? From unrelated code, I want to call `get_ip_via_stun_or_timeout(hostnames: &[String], timeout: Duration) -> Option<String>`, is what I'm missing that I need to wrap this state machine in another to pass it up to the level above? That I need to essentially move the who-must-implement-the-event-loop one level up?

Essentially yes! For such a simple example as STUN, it may appear silly because the code that is abstracted away in a state machine is almost shorter than the event loop itself.

That very quickly changes as the complexity of your protocol increases though. The event loop is always roughly the same size yet the protocol can be almost arbitrarily nested and still reduces down to an API of `handle/poll_timeout`, `handle_input` & `handle_transmit`.

For example, we've been considering adding a QUIC stack next to the WireGuard tunnels as a control protocol in `snownet`. By using a sans-IO QUIC implementation like quinn, I can do that entirely as an implementation detail because it just slots into the existing state machine, next to ICE & WireGuard.

> At this point if I think I want to write a library that supports both sync and async use cases it feels like feature flags & separate implementations might produce an easier to understand outcome for me -- the sync version can even start as mostly `tokio::Runtime::block_on`s, and graduate to a more performant version with better custom-tailored efficiency (i.e. busy waiting).

> Of course, I'm not disparaging the type state pattern here/using state machines -- just that I'd probably just use that from inside an async/sync-gated modules (and be able to share that code between two impls).

This is what quinn does: It uses tokio + async to expose an API that uses `AsyncRead` and `AsyncWrite` and thus fully buys into the async ecosystem. The actual protocol implementation however - quinn-proto - is sans-IO.

The way I see this is that you can always build more convenience layers, whether or not they are in the same crate or not doesn't really matter for that. The key thing is that they should be optional. The problems of function colouring only exist if you don't focus on building the right thing: an IO-free implementation of your protocol. The protocol implementation is usually the hard bit, the one that needs to be correct and well-tested. Integration with blocking or non-blocking IO is just plumbing work that isn't difficult to write.

hardwaresofton · 2024-07-04T13:50:42.000000Z

Ahh thanks for clarifying this! Makes a ton of sense now -- I need to try writing some of these style of programs (in the high perf Rust style) to see how they feel.

> For example, we've been considering adding a QUIC stack next to the WireGuard tunnels as a control protocol in `snownet`. By using a sans-IO QUIC implementation like quinn, I can do that entirely as an implementation detail because it just slots into the existing state machine, next to ICE & WireGuard.

Have you found that this introduces a learning curve for new contributors? Being able to easily stand up another transport is pretty important, and I feel like I can whip together an async-required interface for a new protocol very easily (given I did a decent job with the required Traits and used the typestate pattern) where as sans-IO might be harder to reason about.

Thanks for pointing out quinn-proto (numerous times at this point) as well -- I'll take a look at the codebase and see what I can learn from it (as well as str0m).

[EDIT]

> The problems of function colouring only exist if you don't focus on building the right thing: an IO-free implementation of your protocol. The protocol implementation is usually the hard bit, the one that needs to be correct and well-tested.

The post, in a couple lines!

[EDIT2] Any good recommendations of a tiny protocol that might be a good walk through intro to this?

Something even simpler than Gopher or SMTP? Would be nice to have a really small thing to do a tiny project in.

wh33zle · 2024-07-04T14:53:18.000000Z

> [EDIT2] Any good recommendations of a tiny protocol that might be a good walk through intro to this? > > Something even simpler than Gopher or SMTP? Would be nice to have a really small thing to do a tiny project in.

I only have experience in packet-oriented ones so I'd suggest sticking to that. Perhaps WireGuard could be simple enough? It has a handshake and timers so some complexity but nothing too crazy.

DNS could be interesting too, because you may need to contact upstream resolvers if you don't have something cached.

joshka · 2024-07-05T00:40:27.000000Z

If you want a web protocol, try oauth2. There's complexity in the number of things you can support, but in essence there's a state machine that can be modeled.

hardwaresofton · 2024-07-05T01:06:38.000000Z

Ahh didn't even think of that level of the stack… It is true that the OAuth2 tango can be represented by a state machine…

I’d probably do CAS instead, it’s simpler IMO.

joshka · 2024-07-05T03:06:34.000000Z

The Stun protocol is surprisingly easy to implement, but see my comment at https://news.ycombinator.com/item?id=40879547 about why I'd just use async instead of making my own event loop system.

https://gist.github.com/joshka/af299be87dbd1f64060e47227b577...

hardwaresofton · 2024-07-05T04:21:26.000000Z

Thanks for the code! Going to pore over this.

I read the comment and I definitely agree (though it took me a while to get to where you landed), I think there are some benefits:

- More controllable/easy to reason about cancel safety (though this gets pushed up the stack somewhat). You just can't cancel a thread, but it turns out a ton of places in an async function are cancel points (everywhere you or some function calls .await, most obviuosly), and that can cause surprising problems.

- Ability to easily slap on both sync and async shells (I personally think it's not unforgivable to smuggle a tokio current thread runtime in as a dep and use block_on for async things internally, since callers are none the wiser)

Great comment though, very succinctly explained what I was getting at... I personally land on the "just make everything async" side of things. Not necessarily everything should be Send + Sync, but similar to Option/Result, I'd rather just start using async everywhere than try to make a sync world work.

There's also libraries like agnostic[0] that make it somewhat easier to support multiple runtimes (though I've done it in the past with feature flags).

> The problem with the approach suggested in the article is that it splits the flow (event loop) and logic (statemachine) from places where the flow is the logic (send a stun binding request, get an answer).

Very concisely put -- If I'm understanding OP's point of view, the answer to this might be "don't make the flow the logic"? basically rather encoding the flow as a state machine and passing that up to an upper event loop (essentially requiring the upper layer to do it).

Feels like there are at least 3 points in this design space:

- Sync only state machines (event loop must be at the outermost layer) - Sync state machines with possibly internal async (event loops could be anywhere) - Async everything (event loops are everywhere)

[0]: https://crates.io/crates/agnostic

hardwaresofton · 2024-07-05T04:38:07.000000Z

also a bit late, but you've seen anyhow & miette right? noticed the color_eyre usage and was just wondering

joshka · 2024-07-05T04:57:44.000000Z

yep - color_eyre is a better anyhow (and there's plans afoot to merge them into just one at some point[1]). Miette occupies a space that I generally don't need (except when processing data), while color-eyre is in the goldilocks zone.

[1]: https://github.com/eyre-rs/eyre/issues/177

hardwaresofton · 2024-07-05T05:31:52.000000Z

Thanks for the pointer! Rustaceans are spoiled for choice with good error handling and libraries, great to have so many great choices.

wh33zle · 2024-07-04T11:25:02.000000Z

I too came from the OOP world to Rust (6 years ago now) and in my first 2-3 years I produced horrible code!

Type parameters and traits everywhere. Structs being (ab-)used as class-like structures that provide functionality.

Rust works better if you avoid type parameters and defining your own traits for as much as possible.

Encapsulation is good if we talk about ensuring invariants are maintained. This blog post about parse, don't validate comes to my mind: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

wh33zle · 2024-07-04T07:16:38.000000Z

Channels work fine if you are happy for your software to have an actor-like design.

But as you say, it comes with problems: Actors / channels can be disconnected for example. You also want to make sure they are bounded otherwise you don't have backpressure. Plus, they require copying so achieving high-throughput may be tricky.

wh33zle · 2024-07-04T06:52:39.000000Z

If Rust ever gets a native generator syntax, this might be become achievable because one would be able to say: `yield transmit` to "write" data whilst staying within the context of your async operation. In other words, every `socket.write` would turn into a `yield transmit`.

To read data, the generator would suspend (.await) and wait to be resumed with incoming data. I am not sure if there is nightly syntax for this but it would have to look something like:

  // Made up `gen` syntax: gen(yield_type, resume_type)
  gen(Transmit, &[u8]) fn stun_binding(server: SocketAddr) -> SocketAddr {
   let req = make_stun_request();

   yield Transmit {
      server,
      payload: req
   };

   let res = .await; // Made up "suspend and resume with argument"-syntax.
   
   let addr = parse_stun_response(res);

   addr
 }

Arnavion · 2024-07-04T18:19:00.000000Z

Rust has had native generator syntax for a few years FYI. It's what async-await is built on. It's just gated behind a nightly feature.

https://doc.rust-lang.org/stable/std/ops/trait.Coroutine.htm... and the syntax you're looking for for resuming with a value is `let res = yield ...`

Alternatively there is a proc macro crate that transforms generator blocks into async blocks so that they work on stable, which is of course a round-about way of doing it, but it certainly works.

wh33zle · 2024-07-04T06:07:21.000000Z

They actually play together fairly well higher up the stack. Non-blocking IO (i.e async) makes it easy to concurrently wait for socket IO and time. You can do it with blocking IO too by setting a read-timeout on the socket but using async primitives makes it a bit easier.

But I've also been mulling over the idea how they could be combined! One thing I've arrived at is the issue that async functions compile into opaque types. That makes it hard / impossible to use the compiler's facility of code-generating the state machine because you can't interact with it once it has been created. This also breaks the borrow-checker in some way.

For example, if I have an async operation with multiple steps (i.e. `await` points) but only one section of those needs a mutable reference to some shared data structure. As soon as I express this using an `async` function, the mutable reference is captured in the generated `Future` type which spans across all steps. As a result, Rust doesn't allow me to run more than one of those concurrently.

Normally, the advice for these situations is "only capture the mutable reference for as short as possible" but in the case of async, I can't do that. And splitting the async function into multiple also gets messy and kind of defeats the point of wanting to express everything in a single function again.