What is surprising to me is that it is 2017 and nobody has attempted to solve these problems. Maybe the current mechanisms, with all their faults, are considered good enough?
I think that's unfair: A lot of people have attempted to solve these problems(), and the Windows Way™ has it's own problems. The real answer is probably between elastic infrastructure and the fact that computers got faster and cheaper, and so -- at least for a while the number of people needing solutions here went down.
However a few companies have seen real commercial benefit from actually analysing their data, so we're seeing a lot of "big data" interest. Some of these companies have noticed that a bunch of hadoop/mongo/whatever boxes can't even approach the performance of one "big" box tuned right, so we're seeing somewhat of a resurgence in interest here.
Other systems (Windows and I think Solaris, but some more obscure systems likely do this as well) can do asynchronous disk IO, but it's not perfect, either.
I think the main reason Linux doesn't really do it is that asynchronous IO is an intrusive change and there would be just too much work to implement it in all the file systems etc.; i.e. "not worth it".
Many applications which are not OK with synchronous disk I/O seem to find thread pools good enough: reasonably easy to implement, reasonably portable, usually performs ok.
Traditional unix, including Linux pretends to support poll() etc for file I/O (i.e. it supports the interfaces, but the reality is synchronous). The reason for this is that an application that expects async I/O can kinda-sorta-work with real files -- and "kinda sorta" will not be too bad because local disks are fast enough to paper over any problem.
But then if the disks aren't local, why not actually make non-blocking I/O work as advertised?
Actually I can guess the answere: there are 101 corner cases that mean I can't neatly separate out the apps design for noblocking I/O. And that's why I half buy the argument.
Generally epoll and event io are good enough for millions of simultaneous files or sockets.
Nobody tells you to use direct read or write instead of mmap either.
Mmap - I can't think of a scenario where it improves over read or write in the context discussed here: async access. Mmap is synchronous, and you can't do select/poll on it.
That's some seriously hairy stuff there. I reckon I'd move to building my app as a kernel module more readily than trying to make a robust async arrangement using that mechanism :) Call me demanding, but I think we should can reasonably expect easier access to async from our OS than this mechanism offers.
On that topic of which.. how's the future of operating systems coming along?