Async Rust in Practice: Performance, Pitfalls, Profiling

carllerche · on Jan 12, 2022

Tokio author here. Generally speaking, I recommend strongly against using FuturesUnordered unless you know all the pitfalls. We are working on an alternative utility that should hopefully avoid the issues described here and others: https://github.com/tokio-rs/tokio/pull/4335

psarna · on Jan 12, 2022

That's great news! Especially that the observed performance of the test program based on FuturesUnordered, even though it stopped being quadratic, it was still considerably slower than the task::unconstrained one, which suggests there's room for improvement. Probably due to the fact that you still pay with a constant number of polls (32) each time you go out of budget.

carllerche · on Jan 12, 2022

IMO FuturesUnordered should stop executing futures when it sees a "yield". An explicit yield signals control should be returned to the runtime. FuturesUnordered does not respect this.

dathinab · on Jan 13, 2022

IMHO the main problem is tokio introducing preempting behaviour which in subtle ways can mess with all kinds of normally fully valid rust futures.

Sure it sometimes magically fixes problematic code, but it's in effect still a magic extension to how rust polling works which can have surprising side effects.

In a certain way tokios coperative-preempting is not adequate to handle any future which multiplexes multiple futures but such futures are a stable of rust async since the get to go.

carllerche · on Jan 13, 2022

I don't think it is an issue w/ the pre-emption code. I believe FuturesUnordered is just doing the wrong thing: not respecting yields.

psarna · on Jan 13, 2022

wrt. preemption - in Seastar we have maybe_yield(), which gives up the cpu, but only if the task quota (more or less a semantic equivalent for Tokio's budget) has passed. Wouldn't it make sense to have a similar capability in Tokio? Then, if somebody is not a big fan of the default preemption, they could run their tasks under tokio::task::unconstrained and only check the budget in very specific, explicitly chosen places - where they call maybe_yield(). That could of course also be open-coded by trying to implement maybe_yield on top of yield_now and some time measurements, but since the whole budgeting code is already there... Do you think it's feasible?

dathinab · on Jan 15, 2022

The problem is it requires you to write your code in a way which now _only_ will work with tokio.

Which isn't an option for a lot of libraries.

While the rest of the rust eco-system is increasingly moving to have increasingly more parts runtime independent...

Furthermore I think `maybe_yield` wouldn't be quite the right solution. The problem is that tokios magic works based on the assumption that a single task (future scheduled by the runtime) represents a single logical strang of exexution. (Which isn't guaranteed in rust.)

So I think a better tokio specific solution would be to teach tokio about the multiplexing effect in some way.

For example you could have some way which snapshots the budged when reaching the multiplexer, and reset the budget to the snapshot before every multiplexed feature is called. With this each logical strange of execution would have it's "own" budget (more or less).

dataangel · on Jan 15, 2022

Any extension of executors will require having a trait abstracting the executor used, and there just isn’t one in std yet. Your code already has to be tokio specific if you do something as mundane as spawn a task.

dathinab · on Jan 16, 2022

> mundane as spawn a task.

There are only a few things you need the specific runtime for:

- spawn

- IO

- timeout

But you can mix the executor and reactor doing the IO (not recommended but you can).

Similar you can run your own timer.

And you can abstract in various ways about all of this, sure with limitation, hence why there is no std-abstraction. But there are enough high profile libraries which do support multiple runtimes just fine.

But tokios preemting-cooperated threads to require any code which does any form of future multiplexing to:

- be tokio specific (which btw. isn't fully solving the problem)

- add a bunch of additional complexity, including memory and runtime overhead

If you multiplex features on tokio you must:

- use custom wakers to detect yields

- (and) do not poll futures in a "repeating" order (preferable fully random order).

This is a lot of additional complexity for something like a join_all (for a "small" N).

(reminds me I should check if I need to open an issue with futures-rs, as their join_all impl. was subtle broken last time I checked).

And even with that you have the problem that the multiplexed futures as subtle de-prioritized as they share a budged.

The problem I have with this feature is not that it's a bad idea, it isn't it's in general a good idea. The problem is that it completely overlooks the multiplexing case. And worse, further in subtle ways divides the ecosystem (that is what I'm worried about).

So maybe we could find a way to provide a std (or just common-library) standardized way for just that feature. (I mean it's a task local int, it might not even need to be atomic, maybe. So there might be a way which doesn't have the problem async-std standardization has).

psarna · on Jan 15, 2022

maybe_yield may or may not be the right solution here, but I think it may be useful in general - e.g. when you have long I/O-less computations. In such a case, I'd like to be able to say "yield here if my budget is drained, but continue otherwise and don't put my task at the end of the queue". Although for that the only thing I really need is a way to peek at your budget - with that, open-coding maybe_yield is trivial

carllerche · on Jan 13, 2022

It wouldn’t be too hard. The trickiest bit would be putting together a consistent API.

psarna · on Jan 13, 2022

Now that I think of it, it would probably be beneficial even outside of the unconstrained scope, especially for long computations. When iterating over millions of elements, it would be great to have a mechanism for maybe yielding if we're past the budget, but we don't really want to force-yield on every X iterations and put the task at the back of the queue. If the maybe_yield API is potentially controversial, a sufficient building block would be a function that allows peeking into the state of your budget - and then, if you're out of it, you just explicitly call yield_now().

heftig · on Jan 13, 2022

How could it respect them? The Future trait doesn't let it distinguish between "please yield now" and "please poll again".

carllerche · on Jan 13, 2022

It wouldn’t be too hard to tell it apart. A yield is defined as the task waking itself vs something else waking it. The yield methods already do this.

heftig · on Jan 13, 2022

Again, FuturesUnordered cannot know the difference between a task wanting to yield and a task that wants to be polled immediately. The waker does not get this information, either. It cannot distinguish.

carllerche · on Jan 13, 2022

Here is the PR: https://github.com/rust-lang/futures-rs/pull/2551

Yield = wake the `waker_ref`. Avoiding the yield would be clone().wake().

That said, "poll immediately" isn't actually a thing nor was it ever a thing except in incorrect implementations.

dathinab · on Jan 15, 2022

But polling multiple futures independent of each other inside of a future is a thing (like join, race, etc.).

And that means that just because you get "Pending" (i.e. not ready) from one of the futures, doesn't mean you should return Pending now. I only means this future is not ready, but other futures might still be ready.

But in tokio it means this future is not ready, and we magically as a side-effect might have forced all other futures to be non-ready even if they are.

Which means tokio redefined what Pending means in a subtle but potentially massively-braking way.

Which is a problem.

And not a problem of futures-rs, but one of tokio.

And forcing all of the eco-system to increase the complexity of their code by trying to subtile detect weather something yielded or was force yielded IS NOT OK. That's not how rust standarized yielding or polling.

dathinab · on Jan 15, 2022

> wrong thing: not respecting yields

This is not quite right.

As far as I understand they did respect yield in the way it's defined by rust.

There yields mean just that your future returns `Pending`.

Which means only _this_ future is not ready.

But in tokio returning `Pending` means "not-ready" and "maybe as a _magic side effect_ also make all other futures return pending even if it is ready internally".

So lets step away from FuturesUnordered for a moment and instead just look at a future which multiplexes X-futures and polls each (not completed one) once and then yields, which should be 100% fine.

But with tokio it isn't as after just polling the first few, the "budget" might be consumed and polling all other will forcefully fail where it shouldn't, adding a lot of overhead. Worse if you just poll them in the same order every time you will de-facto starve all futures later in the list. Which means you need to add a bunch of complexity which should be unnecessary just because tokio changed what `Poll::Pending` means.

Also if you write scheduler independent futures (you should if you can) then you can not opt-out of it. Generally in a multiplexed future you don't want to opt. out anyway, you want to have a budged per logical strange of execution, which tokio doesn't provide.

Instead tokio assumes a task (which in the end is just a future polled by it) represents a single logical strange of execution. But that is simple NOT how rust futures are designed!! (Through often it happens to play out that way.)

This doesn't mean that tokios idea is bad, it's just not compatible with rusts future design in subtle edge cases.

I think it would be a nice thing to add it to the future design, but then you would need a standardized way to properly handle multiplexing. (Which as a side not tokio doesn't provide, it only provides opt. out `unconstrained`, but what you need is to snapshot and restore the budget or something similar, i.e. snapshot it when entering the multiplexer and restoring it after each multiplexed future or similar, the only solution futures unordered could take is speculative adding yields, but that's _not_ a proper solution as it subtle de-prioritize multiplexed futures compared to spawned futures and also can easily fall apart if you nest multiplexed futures....).

Also I have no idea why they call it cooperative scheduling. Futures are cooperative scheduling. What they do is preemting futures (i.e. force full yield them), but only in places where they could have yielded in context of cooperative scheduling. So it's some in between solution.

psarna · on Jan 12, 2022

Hi, author of the post here. I'll also be talking about this issue in a little more detail at an online Rust Warsaw meetup tomorrow, feel free to join, especially if you're up for a live discussion later: https://www.meetup.com/Rust-Warsaw/events/282879405/

petr_tik · on Jan 12, 2022

Can you please shed some light on rust's adoption at scylla given the cumulative team expertise in Cpp and unsafe programming like thread-per-core architectures.

In my experience so far, highly proficient Cpp programmers tend to find rust constraints overly restrictive, because they kind of internalise the borrow checker and are comfortable enough to bend the rules sometimes.

how receptive are fellow scyllians (is that the term?) to rust? how much scope is there for future projects to be in Rust over Cpp?

Thanks for the honest and detailed write-up!

psarna · on Jan 13, 2022

> In my experience so far, highly proficient Cpp programmers tend to find rust constraints overly restrictive

For me it's the exact opposite - I find Rust righteously restrictive exactly in places I wish the C++ compiler would complain. For instance, the fact that variables are legal to use after move is an obvious C++ footgun - you almost never want it to actually happen in your code. And then, if one really knows better, Rust has `unsafe`, which I actually never used yet except for providing C/C++ bindings to link a Rust project with Scylla.

So, to sum up, I think that Rust has much better defaults (e.g. move the value by default, everything is const by default, etc.), but still lets you bend the rules explicitly, while in C++ the rules are already slightly bent for you, just in case you need it, which is instead a common source of bugs.

And the adoption of Rust is going great at ScyllaDB. More of it is hopefully coming soon, including a rewrite of user-defined functions support in Rust, which would allow us to fully utilize wasmtime (a very neat Rust project) as the WebAssembly engine.

tialaramex · on Jan 13, 2022

> comfortable enough to bend the rules sometimes.

Mmm. The world we actually have suggests that this comfort is dangerous.

Robert Browning was a famous poet. Certainly competent enough with English that you'd assume he knows what he's doing and can bend the rules. So, Dictionary makers were a little confused by the passage in Pippa Passes, "Then owls and bats / Cowls and twats / Monks and nuns in a cloister's moods / Adjourn to the oak-stump pantry". Why did Browning use the word "Twat" in this context? Turns out that he just had just assumed it was a word meaning a hat nuns wear...

Oops. Of course this goof just means a high school literature teacher covering Browning has to decide whether to skip this part of Pippa Passes maybe "for time" and hope nobody asks about it, or explain to the class that even the best of us screw up sometimes. In Computer Software our mistakes are often not treated so kindly.

psarna · on Jan 20, 2022

In case somebody stumbles upon this old thread, here's the recording of the meetup: https://youtu.be/pgzjPeIQ3Us

eminence32 · on Jan 12, 2022

Nice article, and nice analysis of the problem.

I have a personal theory that once a codebase gets complicated enough (no matter the language, no matter sync or async), you'll run into subtle bugs or performance problems that require a very deep understanding of all the relevant libraries in order to resolve the problem. One worry that I have with async rust is that the executors are so complex that the "baseline complexity" starts rather high.

If this theory is true, then one might expect async rust to run into such problems more often than comparable non-async code. I haven't personally written enough async code to have any data either way, except for a few unpleasant experiences while making errors when writing async code.

To the extent that this theory is a problem that needs solving, I don't think there is a solution. But I do think that, over time, weird async footguns will become less and less frequent as projects like tokio continue their excellent engineering efforts to plug up any weak spots and make things more robust in general.

marcus_cemes · on Jan 12, 2022

> One worry that I have with async rust is that the executors are so complex that the "baseline complexity" starts rather high.

I completely agree, however I like that you have the option to swap out the executor for your own if the situation requires it, which I think concerns a very small number of applications. In other languages with a baked-in runtime you would be left trying to come up with workarounds to make it play nicely.

valenterry · on Jan 13, 2022

> I have a personal theory that once a codebase gets complicated enough (no matter the language, no matter sync or async), you'll run into subtle bugs or performance problems

You probably, unintentionally, defined "complicated enough" to be at the point where these bugs occur - so that's a tautology.

I think the point of programming languages and techniques/libraries is to move this point of "complicated enough" as far out as possible. Without any libraries, it might be easier to understand, but it will happen so much earlier that it's not worth it. And the point of libraries is that not only do they allow you to push the problem out to a much later point in time - they also allow you to switch between projects and not lose all your knowledge.

Even considering high upfront costs, tt's a big net worth imho.

rr808 · on Jan 13, 2022

I think this is true for every language, not just rust. I'm really not convinced async is good for complicated systems. Things like green threads are much easier to design with.

gxt · on Jan 12, 2022

It's a great story and it's pleasant to see end-devs investigations contribute to the overall performance of the ecosystem since Tokio is so widely used. Cheers

scottlamb · on Jan 13, 2022

Nice article. I'm in the process of absorbing it. Question:

> scylla-rust-driver issued at least 1 syscall per query, which might be the source of elevated latency – and with a super-fast network (of which loopback is a prime example) it also means that throughput suffers – a latency of 1ms means that we won’t be able to send more than 1000 requests per second.

Is 1 ms measured? Buffering and reducing syscalls is good, but still: that seems horribly slow for a small or EWOULDBLOCK-returning read/write. Why would it be that bad?

enedil · on Jan 13, 2022

I believe this is just an example of what are the implications of 1 query per ms, not indication that it actually took 1 ms.

psarna · on Jan 13, 2022

Indeed - 1ms was a nice round number, but it was also not that far from the real, sub-1ms number. But note that by "latency" here I mean not only the round trip, but the total measured time of executing a single request, including the time its task spends in Tokio (and its queues), and until its response is fully processed as well. Since the program sent requests with configurable concurrency set to ~1024 or more, the overall throughput was still satisfying.

On the one hand it was (and still is) concerning that the observed latency per simple request was that high, on the other - it never really came up in our distributed tests, since there the network latency imposes a few milliseconds anyway. When we figure out the real source of this behavior, I'll be happy to describe the investigation process in a blog post (:

dagmx · on Jan 12, 2022

I'd love to read this article but man this site is painful on mobile. Multiple popups from the top, header that overlays the text, the chat bubble at the bottom.

It really reduces the amount of space for the actual content

When comparing to Reader View in Safari, the native site has at least 40% less viewing area

r00fus · on Jan 12, 2022

TBH, I've moved to setting Reader View as default for all Safari visits in iOS. You can whitelist sites or just undo reader view for that session if needed.

Zero ads, and mostly I'm there for the text anyway.

PeterCorless · on Jan 12, 2022

Hi Dagmx! Peter Corless here from ScyllaDB. Sorry to hear about your mobile experience. I've notified our web team. If you want to screenshot your mobile view to share with me, please email me at peter @ scylladb [dot] com. You have my commitment we'll be working on improving the page.

Aside from that though, hope you were able to glean something valuable from the article. Would love to hear your opinions.

DenseComet · on Jan 12, 2022

I've been using Scylla for a project and have ended up reading quite a few of those blog posts. They are generally very well written and useful, but I've noticed the same sort of issues. There's the chat popup at the bottom, a banner for Scylla Summit, a cookie banner that doesn't seem to remember what I clicked, and a header with broken transparency.

For me, Cloudflare is the benchmark. Cloudflare.com is very clearly a marketing site, likely run by their marketing team, but their technical content such as their blog and docs are on completely different subdomains with a clean design with no popups or banners. I think this is the best way to run tech focused blog, making both developers and marketing happy.

dagmx · on Jan 12, 2022

Thanks for looking into it. I see others (dmitriid) have uploaded screenshots already, so I won't duplicate.

But yes, otherwise the article itself was insightful and thank you for sharing your findings.

dmitriid · on Jan 12, 2022

Three screenshots: https://imgur.com/a/XUpP9ha

Including the cookie banner that's illegal under GDPR (and is likely illegal under CCPA)

throwaway81523 · on Jan 12, 2022

Xkcd 624?

dmitriid · on Jan 12, 2022

> If you want to screenshot your mobile view to share with me

Just, you know, visit your own site on mobile.

Or open Chrome dev tools and switch to mobile view.

> You have my commitment we'll be working on improving the page.

As with all in-your-face marketing shenanigans, I doubt you'll change anything in the long run.

PeterCorless · on Jan 12, 2022

I did this, in fact. I also had other colleagues. A lot of UI/UX depends on the browser (Chrome, Firefox, Brave, Opera) and the OS. So I wanted to confirm precisely what he was seeing.

And we're already working on fixes.

bschwindHN · on Jan 12, 2022

On an iPhone SE 1, you literally get 5 lines of readable text, and 3 if the Safari navigation UI is showing.

I also can't close the chat bubble because the X is off the screen to the right, and the scroll position jumps around near the top of the page as you scroll because some calculation or CSS is wrong.

dijit · on Jan 12, 2022

Seems alright on my iPad. https://imgur.com/a/SJwg1E2

dagmx · on Jan 12, 2022

That's a much larger screen than the largest iPhone. Hence why I specified mobile.

throwaway81523 · on Jan 12, 2022

I haven't even clicked the link yet, but that the Scylla devs are doing something with Rust already is interesting. Seastar is very cool though constrained by the limitations of C++. It will be great to find out what effect Rust has.

AlexSW · on Jan 13, 2022

Which limitations of C++, sorry?