I'm not sure I agree that cases where local system security really doesn't matter and performance matters are that plentiful, but I am happy to be convinced otherwise. In particular, just about any personal computing context doesn't count - you'd have to not run mutually-untrusted third-party code. That rules out web browsers with JavaScript, that rules out Android/iOS-style independent apps, etc. Sure, if you use the web without dynamic content and you use local office suites you're fine, but on the other hand, you don't really care about performance - a 486 will deliver enough performance to read textual content and run a word processor and spreadsheet.
Gaming is a context where you care about performance and you aren't using multiple apps at once, but (and I admit this is a bit of a naive guess) I'd be surprised if it's syscall-bound. It seems like performance is likely to be I/O-bound (getting assets from disk into memory), CPU-bound, and GPU-bound, but are you really making large numbers of syscalls? (Maybe this matters in online gaming?)
So that leaves basically some specific server workloads, and at that point I think some of these techniques start to be realistic. Pinning your work onto a core and using kernel-bypass networking is a pretty straightforward technique these days. It's not quite as easy as using the kernel interfaces, but it's pretty close, and it's definitely worth investing some engineering effort into if you care about performance - you can get much more than 25% speedups.
I agree that writing in kernel mode is generally unrealistic (although if you're writing a kernel module for Linux, you still don't need to care about fiddly hardware issues - you've got the rest of Linux still running). Mostly I'd like to see more work like the paper I linked - there should be a standard build of Linux which has hardware privilege separation turned off for use in the cases where you actually can avoid hardware privilege separation (single-user VMs on cloud hosts, single-user data crunching machines, dedicated single-tenant database servers, game consoles without web browsers, ebook readers without web browsers, etc.), or at least a flag to spawn a process and leave it in ring 0. If the use cases are plentiful, this seems like it would be valuable for lots of people - and it'd also make it clear that this generally isn't an option you want on personal computers. (But I think the reason this hasn't been done in the last several decades is that there aren't actually that many use cases that are both genuinely single-user and syscall-bound.)
If you think a 486 is sufficient for reading textual content and running a word processor and spreadsheet, you haven't been paying attention to software bloat. A 486 would have a hard time just booting a modern OS, never mind the application software.
> So that leaves basically some specific server workloads,
The vast majority of servers don't run any untrusted code. Servers tend to do lots of syscalls for network I/O.
> Gaming is a context where you care about performance and you aren't using multiple apps at once, but (and I admit this is a bit of a naive guess) I'd be surprised if it's syscall-bound.
I would expect that interfacing with the GPU involves a fair number of syscalls -- but admittedly I'm also guessing.
> single-user VMs on cloud hosts, single-user data crunching machines, dedicated single-tenant database servers, game consoles without web browsers, ebook readers without web browsers
This is a lot of cases. I'd love to get 25% perf back on postgres, or 25% back on my air-gapped DAW, etc. etc.
Benchmark it - your air-gapped DAW is almost certainly spending very little of its time making system calls, and depending on workload, your Postgres probably isn't either. You'll get 25% back on syscall-heavy workloads but your workloads probably aren't syscall-heavy.
Gaming is a context where you care about performance and you aren't using multiple apps at once, but (and I admit this is a bit of a naive guess) I'd be surprised if it's syscall-bound. It seems like performance is likely to be I/O-bound (getting assets from disk into memory), CPU-bound, and GPU-bound, but are you really making large numbers of syscalls? (Maybe this matters in online gaming?)
So that leaves basically some specific server workloads, and at that point I think some of these techniques start to be realistic. Pinning your work onto a core and using kernel-bypass networking is a pretty straightforward technique these days. It's not quite as easy as using the kernel interfaces, but it's pretty close, and it's definitely worth investing some engineering effort into if you care about performance - you can get much more than 25% speedups.
I agree that writing in kernel mode is generally unrealistic (although if you're writing a kernel module for Linux, you still don't need to care about fiddly hardware issues - you've got the rest of Linux still running). Mostly I'd like to see more work like the paper I linked - there should be a standard build of Linux which has hardware privilege separation turned off for use in the cases where you actually can avoid hardware privilege separation (single-user VMs on cloud hosts, single-user data crunching machines, dedicated single-tenant database servers, game consoles without web browsers, ebook readers without web browsers, etc.), or at least a flag to spawn a process and leave it in ring 0. If the use cases are plentiful, this seems like it would be valuable for lots of people - and it'd also make it clear that this generally isn't an option you want on personal computers. (But I think the reason this hasn't been done in the last several decades is that there aren't actually that many use cases that are both genuinely single-user and syscall-bound.)