Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The current popularity of the async stuff has its roots in the classic "c10k" problem. (https://en.wikipedia.org/wiki/C10k_problem)

A perception among some that threads are expensive, especially when "wasted" on blocking I/O. And that using them in that domain "won't scale."

Putting aside that not all of use are building web applications (heterodox here in HN, I know)...

Most people in the real world with real applications will not hit the limits of what is possible and efficient and totally fine with thread-based architectures.

Plus the kernel has gotten more efficient with threads over the years.

Plus hardware has gotten way better, and better at handling concurrent access.

Plus async involves other trade-offs -- running a state machine behind the scenes that's doing the kinds of context switching the kernel & hardware already potentially does for threads, but in user space. If you ever pull up a debugger and step through an async Rust/tokio codebase, you'll get a good sense for what the overhead here we're talking about is.

That overhead is fine if you're sitting there blocking on your database server, or some HTTP socket, or some filesystem.

It's ... probably... not what you want if you're building a game or an operating system or an embedded device of some kind.

An additional problem with async in Rust right now is that it involves bringing in an async runtime, and giving it control over execution of async functions... but various things like thread spawning, channels, async locks, etc. are not standardized, and are specific per runtime. Which in the real world is always tokio.

So some piece of code you bring in in a crate, uses async, now you're having to fire up a tokio runtime. Even though you were potentially not building something that has anything to do with the kinds of things that tokio is targeted for ("scalable" network services.)

So even if you find an async runtime that's optimized in some other domain, etc (like glommio or smol or whatever) -- you're unlikely to even be able to use it with whatever famous upstream crate you want, which will have explicit dependencies into tokio.



> If you ever pull up a debugger and step through an async Rust/tokio codebase, you'll get a good sense for what the overhead here we're talking about is.

So I didn't quite do that, but the overhead was interesting to me anyway, and as I was unable to find existing benchmarks (surely they exist?), I instructed computer to create one for me: https://github.com/eras/RustTokioBenchmark

On this wee laptop the numbers are 532 vs 6381 cpu cycles when sending a message (one way) from one async thread to another (tokio) or one kernel thread to another (std::mpsc), when limited to one CPU. (It's limited to one CPU as rdtscp numbers are not comparable between different CPUs; I suppose pinning both threads to their own CPUs and actually measuring end-to-end delay would solve that, but this is what I have now.)

So this was eye-opening to me, as I expected tokio to be even faster! But still, it's 10x as fast as the thread-based method.. Straight up callback would still be a lot faster, of course, but it will affect the way you structure your code.

Improvements to methodology accepted via pull requests :).


I'd want to see perf stats on branch prediction misses and L1 cache evictions alongside that though. CPU cycles on their own aren't enough.


It doesn't seem my perf provides metric for L1 cache evictions (per perf list).

Here's the results for 100000 rounds for taskset 1 perf record -F10000 -e branch-misses -e cache-misses -e cache-references target/release/RustTokioBenchmark (a)sync; perf report --stat though:

async

    Task 2 min roundtrip time: 532
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0,033 MB perf.data (117 samples) ]

    ...    
    branch-misses stats:
              SAMPLE events:         54
    cache-misses stats:
              SAMPLE events:         27
    cache-references stats:
              SAMPLE events:         36
sync

    Thread 2 min roundtrip time: 7096
    [ perf record: Woken up 5584 times to write data ]
    [ perf record: Captured and wrote 0,367 MB perf.data (7418 samples) ]

    ...
    branch-misses stats:
              SAMPLE events:       6577
    cache-misses stats:
              SAMPLE events:        159
    cache-references stats:
              SAMPLE events:        682


Interesting. Thing is all you're benchmarking is the cost of sending a message on tokio's channels vs mpsc's channels.

It would be interesting to compare with crossbeam as well.

But not sure this reflects anything like a real application workflow. In some ways this is the worst possible performance scenario, just two threads spinning and spinning at the fastest speed they can, dumping messages into a channel and pulling them out? It's a benchmark of the channels themselves and whatever locking/synchronization stuff they use.

It's a benchmark of a "shared concurrent data" situation, with constant synchronization. What would be more interesting is to have longer running jobs doing some task inside themselves and only periodically (ever few seconds, say) synchronizing.

What's the tokio executor's settings by default there? Multithreaded or not? I'd be curious how e.g. whether tokio is actually using multiple threads or not here.


Actually I wasn't that interested in throughput, only the latency in terms of instructions executed since sending until it is received, though indeed the throughput is also superior with tokio.

For most applications this difference doesn't really matter, but maybe some applications do a lot of small things where it does matter? In those cases it might be an easy solution to switch from standard threads to tokio async and gain 10x speed, as the structure of the applications remains the same.

> It's a benchmark of the channels themselves and whatever locking/synchronization stuff they use.

Yeah, in retrospect some mutex-benchmark might be better, though I don't expect a message channel implemented on top of that is noticeably slower. A mutex benchmark is probably easier to get wrong..

> What would be more interesting is to have longer running jobs doing some task inside themselves and only periodically (ever few seconds, say) synchronizing.

I don't quite see how this would give any different results. Of course, in that case the time it takes to transmit the message would be completely meaningless.

> What's the tokio executor's settings by default there? Multithreaded or not? I'd be curious how e.g. whether tokio is actually using multiple threads or not here.

It's using the multithreaded executor. I tried the benchmark with #[tokio::main(worker_threads = 1)] and 2 and while with =1 the result was 529 but with =2 it was 566.


> Putting aside that not all of use are building web applications

Perfect moment to mention "rouille" which is a very lightweight synchronous web server framework. So even when you decide to build some web application you do not necessarily have to go down the tokio/async route. I have been using it for a while at work and for private projects and it turned out to be pretty eye-opening.


Hit the nail on the head.

Unless you're really dealing with absurd numbers of simultaneous blocking I/O, async has entirely too many drawbacks.


>now you're having to fire up a tokio runtime

I've been developing in (mostly async) Rust professionally for a about a year -- I haven't written much sync rust other than my learning projects and a raytracer I'm working on, but what are the kind of common dependencies that pose this problem? Like wanting to use reqwest or things like that?


> Like wanting to use reqwest or things like that?

Yes. Reqwest cranks up Tokio. The amount of stuff it does for a single web request is rather large. It cranks up a thread pool, does the request, and if there's nothing else going on, shuts down the thread pool after a while. That whole reqwest/hyper/tokio stack is intended to "scale", and it's massive overkill for something that's not making large numbers of requests.

There's "ureq", if you don't want Tokio client side. Does blocking HTTP/HTTPS requests. Will set up a reusable connection pool if you want one.


reqwest also has a blocking version, which I use in projects not already using an async rt

https://docs.rs/reqwest/latest/reqwest/blocking/index.html


The blocking implementation still depends on and uses tokio, last I looked.

I've seen this with multiple Rust packages. "Yes, we offer a synchronous blocking version..." and then you look and it's calling rt.block_on behind the scenes.

Which is a pretty large facepalm IMHO


You don't have to do that, Tokio also provides a single-threaded runtime that just runs async tasks on the main thread.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: