If task A & B perform IO (eg, a DB call) and the alternatives are running them sequentially on one thread or running them concurrently (via async/await) on one thread, then running them concurrently can both decrease end-to-end latency and increase throughput.
> If you just synchronously run A then B, the overall time would be shorter (higher throughput) because of less context switching overhead.
There are no context switches: async/await isn't threads. The compiler generates state machines which are scheduled on a thread pool. Basically, each time an event happens (eg, database request completes or times out, or a new request arrives), that state machine is scheduled again so that it can observe that event. This doesn't involve context switching: you can have 1 thread or N threads happily working away on many concurrent tasks without needing to context switch between them.
> There are no context switches: async/await isn't threads.
By "context switch", I didn't meant to imply "hardware thread context switch", just the general sense of "spend some CPU time messing about with scheduling".
There is overhead to async in that you're unwinding the stack, bouncing to the thread pool scheduler, loading variables from the heap (since your async code was compiled to closures) back onto the stack, etc.
As far as I know, it's always possible to complete some given set of work in less total time (i.e. highest throughput) using a carefully hand-written multithreaded program than it is using async. Of course, most people don't have the luxury of writing and maintaining that program, so async code can often be a net win to both throughput and latency, but the overhead is there.
It's analogous to going from a manually-memory language to a language with GC. The GC makes your life easier and makes it much easier to write programs that are generally efficient, but it does incur some level of runtime overhead when compared to a program with optimally written manual alloc and free.
> If you just synchronously run A then B, the overall time would be shorter (higher throughput) because of less context switching overhead.
There are no context switches: async/await isn't threads. The compiler generates state machines which are scheduled on a thread pool. Basically, each time an event happens (eg, database request completes or times out, or a new request arrives), that state machine is scheduled again so that it can observe that event. This doesn't involve context switching: you can have 1 thread or N threads happily working away on many concurrent tasks without needing to context switch between them.