Currently writing code targeting the perf_event_open system call. It's the nastiest thing I've ever had the displeasure of working with. clone() is similarly "interesting".
Glibc really does a lot of good work to hide the mess underneath.
Have you ever had the (dis)pleasure of porting to Windows? It’s a pile of hot garbage that keeps on accumulating because of so precious backwards compatibility; every single idiosyncrasy from thirty years ago lives on for ever.
Yea, I wrote windows code for 10 years and while it has its warts I will say the ETW subsystem is much more thought out. The ntdll way of abstracting syscalls is also a lot nicer and something Linux should consider.
The biggest problem with Linux is it doesn't have a coherent design philosophy. So some subsystems are nice and others are horrendous. Knowledge of one subsystem may lead to misleading assumptions about another part of the kernel.
An example is the kernel supposedly doesn't have threads, they are just processes that share address space. But of course other parts do in fact need to understand that there is one coherent bundle of threads that compose this abstract idea of a process. So some places differentiate between thread id and process ids and others mix them. Windows has its inconsistencies, but not with something so fundamental as a process.
You’ve obviously never tried to write performant I/O logic.
To see what I mean, try using epoll to manage a set of network connections. They apparently didn’t consider the case where you have more than one CPU and also want to handle more than one network connection. Also, if you do get it to work without crashing on stale fd’s, you’ll find it bottlenecks on a spin lock.
If you want to save some time and jump to the current state of the art, use DPDK or some other user space network driver + IP stack to completely bypass the kernel. :-(