Hacker News new | past | comments | ask | show | jobs | submit login

Do folks typically turn on journaling at the filesystem layer when running a database?

The database itself contains journaling, so one might choose to run with data=writeback or even directly against the block device if they were concerned about performance.




You definitely need both, these are two completely different kinds of journalling:

- Filesystem journalling is making robust changes to the data structures describing directories, files, and where files live, in units of atomic filesystem operations. For example, the filesystem journal may record "CREATE FILE", which translates to "update directory entry 1234 in directory block 5678, then allocate and initialize extent descriptor 9999, then write an inode at array entry 74234"

- Database journalling is making robust changes to the data structures describing the actual file contents, in units of atomic logical application operations. For example, a DB journal may record "INSERT ROW", which translates to "update block 123 of this index file, and 234 of this data file", application-specific relationships like that cannot be captured by the filesystem on UNIX.

(Note: NTFS is transactional on Windows. It's entirely possible to correlate independent writes and make them atomic, so on Windows at least, in theory a DB could exist without a separate journal. I don't know if this is used in practice). Even if it were in use, it places severe limits on the kinds of concurrency optimizations a database system could otherwise perform, because all of that stuff moves behind the curtain of the OS interfaces.


data=writeback does not disable the journal completely. It only removes ordering of the data writes relative to the metadata journaling. The metadata journaling itself remains active.

You can create the file, preallocate space, fsync the inode and the directory to ensure that it will be visible after a crash and then begin using the allocated space as a journal. Then you only have to fdatasync or sync_file_range whatever part of your journal needs to be persisted and those syncs can now be unordered relative to the filesystem's metadata journal without risk of data loss.

So data=writeback can be used safely, but you have to be very very careful about getting the syscall sequence right. Most applications implemented with sufficient paranoia and so are better served by stricter ordering modes and auto_da_alloc.


I don't think that those who read the manual do: https://www.postgresql.org/docs/13/wal-intro.html (unless they care about quick crash recovery)


Is this why many database yield better performance with XFS vx Ext4.


After power loss without FS journal, your database may not have files to work with.


It's on by default in ext4.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: