The article seems to imply that fsyncs need to happen in a linear order, that is, 1ms / fsync -> 1000 fsyncs/sec. It seems to imply that any batching happens in a linear order as well, that is, we have to completely finish one batch of fsyncs before the next one begins. Is that true? Obviously some systems (including databases) will happily let the beginning of one transaction overlap with another, only simulating linearizable transactions when that is required by the isolation level. But I don't have a great mental model of what the file systems are doing under the hood.
So uh... I read a third of that, knowing roughly how the rest will go, based only on the context of the thread.
However in the back of my mind I know Dan Luu is preserving it for a reason. And I don't know if that's explained later, or whatever; what I do know is Dan Luu has pointed on awe that most fs don't error correctly on write failures - as an example.
As in, that isn't even implemented in some fs drivers.
So the very idea that somehow Postgres can do direct IO and it magically gets better - to me that's the joke Dan Luu sees. Maybe.
That Craig person, the OP in the thread. Imagine doing all that work, having other people say, hey, that's an issue; and then all the people saying "so what" or "nothing can be done"
> That Craig person, the OP in the thread. Imagine doing all that work, having other people say, hey, that's an issue; and then all the people saying "so what" or "nothing can be done"
For context - Craig's opinion won out, and what he was suggesting (crash-restart and perform recovery) is what postgres has been doing for many years (with an option to revert back to retrying, but I haven't seen anybody toggle that).
The author does mention that the OS appears to be performing a sort of batching or aggregation of fsyncs after reviewing the test results and concludes that more than 1000 fsyncs are occurring per second. I’ve also confirmed this by running some benchmarking on EC2 instances with gp2 volumes: https://justincartwright.com/2025/03/13/iops-and-fsync.html
in the old days of HDD the Linux IO driver (some of them) would also re-order the writes in the queue to minimize HDD head seeks.
>A modern disk can do ~1000 fsyncs per second
sounds low for SSD. Haven't benchmarked for a while though. Sounds like something a 5-7 HDD disk array would do if i remember numbers correctly.