I'll be interested to hear why epoll and kqueue are so very different. Strikes me that both are quite similar: you attach waitobject-specific events to fds, and then wait on the waitobject until one of the events occurs. Much like one another, and not much like select/poll!
And seems like both are quite different from IOCP too... kqueue_qos aside, you get the readiness state(s) and then do the operation(s) that look likely to be possible. So from this viewpoint epoll/kqueue/poll/select are actually basically the same - in contrast to the IOCP approach of doing the operation and then getting a notification when it completes. kqueue/epoll vs select/poll then looks like a more efficient way of doing the same stuff (with improvements - e.g., because the OS has more information to hand in the kqueue/epoll case it probably has more opportunity to minimize multiple wakeups, etc.).
kqueue allows for batch updates on fds that are being polled for readiness. In addition, it has less hacky support for non-socket files such as timers, events, signals, and disk IO. Setup for all of these on epoll requires extra unique syscalls. I'm not sure I've even ever seen someone use epoll for disk IO (via AIO). Overalls, kqueue just seems like a more cohesive, unified async solution.
IOCP is the sensible way to do it. Unix, for a system where supposedly everything is a file, has a lot of very specific behaviors for "special" kinds of files. In contrast IOCP lets you treat all async operations from timers to sockets to disk the same way and the thread pool does a decent job of scaling too.
I've used both programming models, and I find "can I read/write this fd" much more intuitive and versatile than "try to do it and let me know when actually done". In particular, you can use many different models to distribute and parallelize work with poll/epoll.
Also, poll/epoll get even more versatile with current Linux systems, which take "everything is a file" much further with signalfd, timerfd, and eventfd.
The problem with IOCP is that it is more memory intensive as all memory for outstanding operations must be pre-allocated. With the readiness model, you can use pools of memory instead for dramatically less overall memory usage. There is a hack to use 1B reads with IOCP to get around this, but it doesn't feel very clean.
And seems like both are quite different from IOCP too... kqueue_qos aside, you get the readiness state(s) and then do the operation(s) that look likely to be possible. So from this viewpoint epoll/kqueue/poll/select are actually basically the same - in contrast to the IOCP approach of doing the operation and then getting a notification when it completes. kqueue/epoll vs select/poll then looks like a more efficient way of doing the same stuff (with improvements - e.g., because the OS has more information to hand in the kqueue/epoll case it probably has more opportunity to minimize multiple wakeups, etc.).