Just from taking a look, it seems like there are two use cases where one would really want to use this:
1. Programs that do a lot of IO to and from regular files, and that don't want to bother with a thread pool. For this reason I would expect to see it land in the golang runtime, and in other event loop implementations like libuv.
2. Legitimately IO bound programs, which can use SQ polling. Other users of aio fall in this category, as well as qemu.
For everything else, it looks like epoll is still mostly an equivalent choice to io_uring? Has anyone got any benchmarks for using io_uring in a typical network daemon, i.e. something that would generally be bound by socket I/O?
Lots of things that don't seem like heavy IO operations can still incur high walltime costs, e.g. stating all files in a directory tree. Since each stat is a syscall executing all of them sequentially is clostly. io_uring can be used for batching these kinds of things even when your code isn't built around asynchronous execution.
And with the 5.7 changes you can do polling + buffer selection + reading for any number of sockets with a single syscall[0].
That does seem useful. But wouldn't you want to put this into a library that falls back to a thread pool approach on older kernels? And in that case, it seems like the application doesn't particularly care what the underlying implementation is? Since it's not built around asynchronous execution it just calls some function that blocks until the CQ is complete. This appears to be what the golang runtime will have to do.
>And with the 5.7 changes you can do polling + buffer selection + reading for any number of sockets with a single syscall
Thank you, this is very close was what I was looking for. (It hasn't made its way into the manpages yet)
That was mostly meant as an example of the more general case where using io-uring can reduce overhead of any kind of IO operation, even when they're not the primary focus of your application.
The point is that yes, io-uring is great for event-based, asynchronous libraries. But even traditional synchronous code can make significant gains by switching to it. As the article hints with its package name: io-uring is the one ring to bind them all.
> This appears to be what the golang runtime will have to do.
The go standard library can tie io-uring into its goroutine scheduler. When doing IO it suspends the green thread, submits the work to IO uring and polls the completion queue of the ring when looking for tasks that need to be woken up.
The way I see it, io_uring has the potential to basically become the only way that high-level programming languages and runtimes talk to the kernel for I/O. If you're working with I/O through an abstraction, I don't see any reason why your implementation wouldn't use io_uring, and lots of reasons why it should.
Pretty much everything touches the disk at some point. A lot of those things are asynchronous daemons these days. They currently need a complex thread pool to handle disk I/O - io_uring means they can run disk I/O on the same thread, and all they have to do is some buffer management. It’s a lot simpler and cleaner a programming model.
Source: recently developed a web server that uses io_uring.
Right now, unfortunately not, but I'll be releasing it open-source within the month. Look out for a Show HN post on the subject of live video streaming.
May not be your typical network daemon, but you can still look at the relative gains.
Generally speaking, io_uring is just a general-purpose programming model to interface with the kernel. Socket I/O is just one part of the story. But when combined with other things, it becomes much more powerful.
Any asynchronous event-loop program that touches files wants something like this. Unix file IO is classically synchronous; the hack around this was to run a userspace threadpool. It's unpleasant. The Unix "AIO" interfaces all kinda suck.
Unfortunately you're still stuck with having to run a thread pool if you want to do something that has no async version, such as reading a directory. I wish it went further!
Yes, io_uring supports network io. Kernel 5.5 added support for accept, but you have to wait till 5.7 before you can string together an accept followed by a read(v). I still haven't seen much in the way of hard numbers, yet, but theoretically it should be darn quick. If you're using preregistered buffers there's no copying needed, and if you ask the kernel nicely it will spin up a thread to monitor the submission queue so you don't even need syscalls after setup.
To add to what couchand said, there are already comparisons of io_uring with SPDK and they are pretty close. So I would expect near about the same result for DPDK too.
> Fortunately, there’s an age-old solution to this problem - ring buffers. A ring buffer allows efficient synchronization between producers and consumers with no locking at all.
I don't think this is correct--ring buffers still require a mutex to prevent the producer from updating the write pointer while the reader checks the write pointer.
As long as you have a reasonably modern CPU, you can use one of the various Compare-And-Swap (CAS) instructions to implement properly lock-free FIFO queues. There are a few variations but a circular array is one of the possibilities. We can probably discuss for quite some time if it's entirely lock-free if the lock is implemented in hardware but I think we can agree it's not a mutex :)
disclaimer - I used to work at ScyllaDB