2. If a read on a non-blocking file descriptor can't be
satisfied from the cache, return -1, set errno to EAGAIN,
and kick off an asynchronous task to load that range of
bytes into the cache (an 'asynchronous load')
The way Windows and Unix operating systems are written, that "asynchronous load" will always be implemented by using a pool of kernel-scheduled threads. The filesystem is a stack of software, several layers deep, with tentacles reaching all over the place (e.g. modern file buffer and VM page caches are unified). Existing APIs and implementations simply do not support truly asynchronous operation throughout the entire stack. This is one reason why AIO on Linux is limited to direct I/O--to cut out all the middle layers--and even then it's not truly asynchronous (e.g. file opens). Windows asynchronous I/O just uses a kernel thread pool. So does FreeBSD. And so does every Linux proposal for buffered AIO.
If you want pollable file reads, just use eventfd and roll your own. Linux threads are lightweight enough that there's not much reason to care whether they're "kernel" or "user" threads. But pollable file I/O will really only be a win for streaming reads and writes, so you could just use sendfile(), in which case you're reading or writing to a pollable socket; you use a dedicated thread pool to pump data from sockets to the file descriptors, or vice-versa, without any copying.
BTW, Go doesn't need this solution (timeouts notwithstanding) because Go schedules multiple goroutines across multiple kernel-scheduled threads.
If you're really worried about file I/O latency, you should be at least as worried about VM pagefault latency. I disable swap on all my Linux servers. And I also disable overcommit. But my languages of choice (C and Lua) are capable of gracefully handling allocation failure (contrast Python or Go), and I implement my solutions to handle allocation failure without crashing the entire service. Too many other languages make the assumption that memory is infinite. (Alas, large parts of the Linux kernel do this as well, even with overcommit disabled :(
Solaris and Windows are the only major OSs I'm familiar with that implement strict and rigorous memory accounting, permitting the system to stay alive (and consistent) under memory exhaustion. FreeBSD might as well, I'm not sure.
If you want pollable file reads, just use eventfd and roll your own. Linux threads are lightweight enough that there's not much reason to care whether they're "kernel" or "user" threads. But pollable file I/O will really only be a win for streaming reads and writes, so you could just use sendfile(), in which case you're reading or writing to a pollable socket; you use a dedicated thread pool to pump data from sockets to the file descriptors, or vice-versa, without any copying.
BTW, Go doesn't need this solution (timeouts notwithstanding) because Go schedules multiple goroutines across multiple kernel-scheduled threads.
If you're really worried about file I/O latency, you should be at least as worried about VM pagefault latency. I disable swap on all my Linux servers. And I also disable overcommit. But my languages of choice (C and Lua) are capable of gracefully handling allocation failure (contrast Python or Go), and I implement my solutions to handle allocation failure without crashing the entire service. Too many other languages make the assumption that memory is infinite. (Alas, large parts of the Linux kernel do this as well, even with overcommit disabled :( Solaris and Windows are the only major OSs I'm familiar with that implement strict and rigorous memory accounting, permitting the system to stay alive (and consistent) under memory exhaustion. FreeBSD might as well, I'm not sure.