Blocking accept(2) makes a decent to good dispatcher, depending on your OS, and ...

Blocking accept(2) makes a decent to good dispatcher, depending on your OS, and if you can stomach one request per socket (if you can't, you would need to pass the sockets back to a dispatcher between requests to wait for the socket to become ready. In the good old days, you could use accept filters and not see incoming connections until they were ready, but that doesn't really work for TLS or modern http with persistent connections.) You could make that pretty fast if you run one dispatcher per core, and align them with the NIC queues; each dispatcher with its own pool of workers.

If your work is mostly compute, then you usually don't really want to run more concurrency than one, maybe a few workers per core, and then OS scheduling is easy. If your work is more of waiting for i/o, large concurrency makes more sense, but the OS scheduling is not going to be too hard there, because it takes almost nothing for the OS to leave a process blocked on i/o; but you do need to have good timer scalability if you have a lot of processes, since they're all going to want to set and clear a timeout on most of the syscalls. io_uring etc with a small number of os processes/threads might be less work for the kernel, but certainly at the cost of isolation.