Universal kernel bypass for IO? Does it imply every userspace program should know how to communicate with every kind of IO device? No block layer? What about memory and caching?
That's not a bad idea. You'll need some sort of mechanism to make sure they don't step on each other's toes, but otherwise it's a sound concept, not unlike MS-DOS.
It seems some highest-performant enterprise solutions consider the MS-DOS way as well.
"The Coherent Accelerator Processor Interface (CAPI) is a general term for the infrastructure that provides high throughput and low latency path to the flash storage connected to the IBM POWER 8+ System. CAPI accelerator card is attached coherently as a peer to the Power8+ processor. This removes the overhead and complexity of the IO subsystem and allows the accelerator to operate as part of an application. In this paper, we present the results of experiments on IBM FlashSystem900 (FS900) with CAPI accelerator card using the "CAPI-Flash IBM Data Engine for NoSQL Software" Library. This library provides the application, a direct access to the underlying flash storage through user space APIs, to manage and access the data in flash. This offloads kernel IO driver functionality to dedicated CAPI FPGA accelerator hardware. The results indicate that FS900 & CAPI, together with the metadata cache in RAM, delivers the highest IO/s and OP/s for read operations. This was higher than just using RAM, along with utilizing lesser CPU resources."
OTOH, "As one step toward building high performance NVM systems, we explore the potential dependencies between system call performance and major hardware components (e.g., CPU, memory, storage) under typical user cases (e.g., software compilation, installation, web browser, office suite) in this paper. We find that there is a strong dependency between the system call performance and the CPU architecture. On the other hand, the type of persistent storage plays a less important role in affecting the performance."