> When we look at the fsync() and fdatasync() man pages, we see that those system calls only guarantee to write data linked to the given file descriptor. With ext4, as a side effect of the filesystem structure, all pending data and metadata for all file descriptors will be flushed instead. This creates a lot of I/O traffic that is unneeded to satisfy any given fsync() or fdatasync() call
Does it mean that, under ext4, a call to fsync is essentially the same as a call to sync(2)?
Yes, this was a source of many problems when firefox moved its history and bookmarking engine to sqlite (I think) which used fsync for consistency guarentees. Leading to situations where making a bookmight might cause your entire computer to pause as disk writes spin up.
Last year I had a problem where I had a function that was logging and a sync() call in another process was blocking the logging for greater than 10 seconds and this caused a timeout aborting the operation my code was performing. I have now moved all logging into a queue that logs on its own thread. This was a gotcha I never anticipated.
Unless you have infinite memory, at some point you want a task to slow down, block, or whatever in the face of resource exhaustion.
It's not a bad idea to maintain a local buffer that gives you a certain amount of cushion. I recently helped a team resolve the exact problem you had, with a similar solution. But excessive, unnecessary use of non-pageable memory is one of the things which induce early I/O contention, causing these stalls to begin with. (Consider an overloaded or errant process generating and buffering a lot of logging noise, precisely because the overtaxed system is under heavy I/O contention.)
To reiterate: you want backpressure, which means that you want a process which is exhausting limited resources to slow down or block. And you want that to transitively slow down or block upstream requests. Too many developers don't understand this and insert hacks to solve their immediate problem (e.g. closing a ticket complaining about intermittent SLA latency misses) without appreciating the broader issues, which at the end of the day just contributes to these problems.
One of the alternatives people attempt is to insert a gazillion knobs to permit dedicated resource allocation. But now you just have two problems, the second being figuring out what the magic values should be--a never ending and often intractable problem. This rarely ends well except for highly specialized tasks--e.g. a dedicated DB administrator who spends all day attending to and tuning a database instance.
That said, in the old days you mounted /var (and if you were super fancy, /var/log) on different disks to minimize unrelated I/O contention.
Yes. I consider it one of my worst bugs in the Linux kernel and in big servers you pretty much can't have any programs that call fsync() because it will cripple your performance.
That paragraph caused a whiplash of emotions while reading, "Cool!!!... what???? Ug."
Yes, however when you're running a DBMS in your server, I guess it will be doing fsync() all the time. I wonder, if other filesystems like XFS or ZFS have a different behavior here, how much performance gain for a typical database load they can achieve in comparison to ext4.
> When we look at the fsync() and fdatasync() man pages, we see that those system calls only guarantee to write data linked to the given file descriptor. With ext4, as a side effect of the filesystem structure, all pending data and metadata for all file descriptors will be flushed instead. This creates a lot of I/O traffic that is unneeded to satisfy any given fsync() or fdatasync() call
Does it mean that, under ext4, a call to fsync is essentially the same as a call to sync(2)?