>MySQL will group multiple writes with each fsync in the old days of HDD the Lin...

tibbar · 2025-03-21T22:09:12 1742594952

The article seems to imply that fsyncs need to happen in a linear order, that is, 1ms / fsync -> 1000 fsyncs/sec. It seems to imply that any batching happens in a linear order as well, that is, we have to completely finish one batch of fsyncs before the next one begins. Is that true? Obviously some systems (including databases) will happily let the beginning of one transaction overlap with another, only simulating linearizable transactions when that is required by the isolation level. But I don't have a great mental model of what the file systems are doing under the hood.

evanelias · 2025-03-22T04:50:33 1742619033

The error semantics of fsync are very important here and this can get messy, for example see https://danluu.com/fsyncgate/

genewitch · 2025-03-22T17:03:30 1742663010

So uh... I read a third of that, knowing roughly how the rest will go, based only on the context of the thread.

However in the back of my mind I know Dan Luu is preserving it for a reason. And I don't know if that's explained later, or whatever; what I do know is Dan Luu has pointed on awe that most fs don't error correctly on write failures - as an example.

As in, that isn't even implemented in some fs drivers.

So the very idea that somehow Postgres can do direct IO and it magically gets better - to me that's the joke Dan Luu sees. Maybe.

That Craig person, the OP in the thread. Imagine doing all that work, having other people say, hey, that's an issue; and then all the people saying "so what" or "nothing can be done"

Amazing. I'll have to read the rest a bit later.

anarazel · 2025-03-23T12:42:28 1742733748

> That Craig person, the OP in the thread. Imagine doing all that work, having other people say, hey, that's an issue; and then all the people saying "so what" or "nothing can be done"

For context - Craig's opinion won out, and what he was suggesting (crash-restart and perform recovery) is what postgres has been doing for many years (with an option to revert back to retrying, but I haven't seen anybody toggle that).

jcartw · 2025-03-22T12:54:17 1742648057

The author does mention that the OS appears to be performing a sort of batching or aggregation of fsyncs after reviewing the test results and concludes that more than 1000 fsyncs are occurring per second. I’ve also confirmed this by running some benchmarking on EC2 instances with gp2 volumes: https://justincartwright.com/2025/03/13/iops-and-fsync.html

Palomides · 2025-03-22T01:10:08 1742605808

on an SSD with PLP (or on optane) I think you can get 20x rate, or more