Doing it on a whole program basis is also unlikely to give much benefit. Function calls are extremely fast, they only show their overhead in tight loops.
Yeah, reviewing the repository there’s a lot of high level language code now, so that probably removes most of the benefit of assembly. You’re still calling into the OS a lot.
Some of the other low level systems (like the Mac) had a trap system that wasn’t so far away in cycle count from user code. But in these days of needing 10,000 cycles to bridge a system call it’s best to do whatever you can to avoid calling the OS.