With spectre/meltdown mitigations increasing the cost of syscalls I imagine this becomes quite significant.
Somewhat orthogonally, does anyone happen to know if the hardware fixes eliminating the need for software mitigations still increase syscall cost substantially? I imagine if you have to inhibit speculation across context boundaries the cost is still higher than a pre-spectre world, even if done in hardware...
The Meltdown workaround - separate kernel and userspace page tables with %cr3 switched on kernel entry and exit - is responsible for the bulk of the syscall cost increase. That one is entirely fixable in hardware without a significant cost penalty (after all, AMD wasn't affected by that one).
The various kinds of Spectre are harder (and are wider than just the kernel, things like Javascript VMs are also affected), but the cost there is more diffuse.
That's because they effectively ossified the software fix in hardware, right? At some point we might be able to come up with a real way to fix the issue.
I'm not sure if there are even any true hardware mitigations actually shipping yet. I think the closest we have are products that ship with stock microcode implementing the same changes that aftermarket microcode updates applied to existing products.
> but very little cpu time is spent on kernel overhead for any heavy io application
I would really like to see some backing for that claim. In I/O heavy applications that I've seen (Think performance enhancing proxies), syscalls easily account for over half the CPU cycles.
It really wasn't all that great during the first couple years. And it still has a few annoying caveats today. The advertised flexibility (port sharing etc) most probably wasn't the only reason for http.sys back in the day :)
EDIT: that is not to say that it is as horrible to use as epoll (at least in multithreaded programs), but my favorite has to be kqueue.
> Axboe claims that it is far more efficient, but no benchmark results have been included yet to back up that claim. Among other things, this interface can do asynchronous buffered I/O without a context switch in cases where the desired data is in the page cache; buffered I/O has always been a bit of a sore spot for Linux AIO.