To be clear I do not believe that async disk I/O is never useful, I just think that it's not as useful as people at first imagine when they learn about async I/O.
Yes, it may be 1000 times slower than memory. But there's a fundamental paradigm difference from network events, in that with network events you are waiting for some other entity to take action, with no implicit expectation that they will do so in any particular timeframe. Like, if you're waiting for connections on a listen socket, there's no telling how long you will be waiting.
Disk I/O is fundamentally different in that once you submit an operation, you expect it to complete within a reasonable, finite time period.
Async disk I/O is primarily useful for implementing read-ahead / write-behind scheduling behaviors. While databases tend to be the obvious use case, the OS is often so poor at this that there are large performance improvements even for much simpler use cases that are otherwise disk I/O intensive.
I'm not sure that's the primary use case any more. Fast SSDs require high queue depths to use their full throughput, so async IO is desirable to use any time an application knows it has several IO requests to issue in parallel—one thread per request has too much overhead.
Sure, but that behavior is effectively read-ahead / write-behind on your I/O buffers. That doesn't mean much more than anticipating future I/O operations before completion of that I/O is required by the code for efficient forward progress.
They're really not equivalent. Read-ahead only helps for predictable IO patterns. Issuing multiple read requests in parallel from the application is useful in a far broader range of scenarios. And for both reads and writes, being able to submit IO in batches (without having to wait for the entire batch to complete) can drastically cut down on overhead compared to submitting IOs sequentially as if they were a linear dependency chain, and makes it possible to keep the storage properly busy instead of it idly waiting on the host software to prepare and submit the next IO.
All cache replacement algorithms are literally equivalent to universal sequence prediction problems, per the optimality theorem. There is no implication of sequential decisions here. When you schedule a batch of disk I/O, you are essentially front-running the sequence predictor to avoid classes of prediction failure where successful prediction would be computationally intractable (and therefore not implemented in real systems), which is expected to produce better I/O throughput on average if done competently per the same theory. There is nothing magic about this, it is in the literature, and databases in particular have explicitly exploited non-sequential scheduling to circumvent fundamental sequence prediction limits for decades. Optimally anticipating future requirements for reads and writes can be called whatever you like, but that remains the primary use case for async I/O since you can't do it with blocking I/O in a single thread.
This becomes more important as caches become larger because cache efficiency increases are strongly sublinear as a function of size, as expected. Servers are already at the scale where very deep async I/O scheduling is required for consistent throughput with high storage density, beyond what can be done via traditional buffered disk I/O architectures, async or not. It is an active area of research with some interesting ideas.
To be clear I do not believe that async disk I/O is never useful, I just think that it's not as useful as people at first imagine when they learn about async I/O.
Yes, it may be 1000 times slower than memory. But there's a fundamental paradigm difference from network events, in that with network events you are waiting for some other entity to take action, with no implicit expectation that they will do so in any particular timeframe. Like, if you're waiting for connections on a listen socket, there's no telling how long you will be waiting.
Disk I/O is fundamentally different in that once you submit an operation, you expect it to complete within a reasonable, finite time period.