Musing: sometimes I wish file systems & databases were the unified. I'm imaging ...

weinzierl · on Oct 1, 2021

We've been there. Before we had filesystems as we know them today, there were many different ways of persistent data storage. Roughly these could be grouped into two camps: The files camp and the records camp.

The record based approach had many properties we know from modern databases. It was a first class citizen on the mainframe and IBM was its champion.

In my opinion hierarchical filesystems won as everyday data storage because of their simplicity and not despite it. I think the idea of a file being just a series of bytes and leaving the interpretation to the application is ingenious. That doesn't mean there is no room for standardized OS-level database-like storage. In fact I'd love to see that.

rkagerer · on Oct 2, 2021

I've always been annoyed how searching a few hundred thousand NTFS records for a filename containing some arbitrary text takes a relatively long time - even using specialized tools like FileLocator Pro which I believe directly scan low-level structures like the MFT - while I can do an equivalent search in a SQL database in milliseconds. I wish filesystems like that one had vastly more performant indexing structures for the metadata (without relying on add-on layers that defer indexing and - at least in my experience - tend to break down or be out of date or obfuscate files they think you don't care about - I'm looking at you, Windows Search).

RicardoLuis0 · on Oct 2, 2021

I've been using a free tool by voidtools tool called "Everything" for years, it provides almost-instant search on windows NTFS volumes: https://www.voidtools.com/support/everything/

rkagerer · on Oct 2, 2021

Thanks. I used it years ago but it didn't suit my needs. I haven't found an indexing tool that does.

It's the difference between synchronous indexing that's baked into the system (as in file system metadata structures and database indexes, which update at the same time your data is changed) vs. fragile add-ons that index asynchronously (which in general I find tend to be too slow to update, missing results, and prone to breaking).

solidsnack9000 · on Oct 2, 2021

The lack of structure is ultimately why writes to filesystems are comparatively fast and reliable, though.

eximius · on Oct 2, 2021

well, there is a good reason they cant do that. they dont know what the files are.

perhaps filesystems should be extensible in a way that supports indexing intelligently.

(i know you mentioned an aversion to addons)

Koshkin · on Oct 1, 2021

Modern filesystems in a way combine both approaches - they store the data unstructured but give the ability to also store metadata (attributes) in a structured way.

refenestrator · on Oct 1, 2021

That functionality is mostly an afterthought in ext and ntfs, though. More of a big deal for Apple with hfs but still not something you'd build a database on.

rektide · on Oct 2, 2021

> I think the idea of a file being just a series of bytes and leaving the interpretation to the application is ingenious.

the file as an opaque box for applications to store a real data structure is poisonously anti-file. it's totally what files are, what we think of them, but imo, systems like 9p, or linux's procfs or sysfs are The True Way for files: small discrete pieces of data which are part of a system of directories tlthat express a larger compilated hierarchical system of data.

Files won, but only the stupidest wrongest version. Easy to copy and manage but utterly useless on their own, unscriptable, pointless eithout their complex applications there to use them.

I dont think db's/records are that interesting either. i think we just need to really try files. Fine grained files. As opposed to these big ole blobs the OS cant really interact with.

remexre · on Oct 2, 2021

Don't you need mandatory locking of files and directories, or rather powerful transactional semantics for the filesystem then?

rektide · on Oct 2, 2021

A lot of filesystems have snapshots. NTFS & others have transactional capabilities. I don't regard locking as necessary or helpful when the OS can provide these capabilities.

mike_hock · on Oct 2, 2021

There aren't really any hurdles to implementing an SQL database on top of a plain block device, are there? So I wonder why no-one has gone there. This would allow the database server to do caching in a way that makes sense for the database and not have to hope that the filesystem cache does the right thing.

gpderetta · on Oct 2, 2021

Commercial databases routinely do exactly that. O_DIRECT basically exists because Oracle needed it

pkaye · on Oct 1, 2021

Are you thinking of something like WinFS?

https://en.wikipedia.org/wiki/WinFS

Or more like Beos BFS with its extended attributes, indexing and querying?

https://en.wikipedia.org/wiki/Be_File_System

Also I think a lot of the old mainframe filesystems had the concept of records and indexes built in since they were primarily used for business operations.

arghwhat · on Oct 1, 2021

Your filesystem is a database. It's just a document-oriented database, rather than relational SQL.

cogman10 · on Oct 1, 2021

It even has a lot of the same features as a full fledged DB.

For example, most file systems today are journaling. Which is exactly how most databases handle atomic, consistent, and durability in ACID.

About the only thing it's missing is automatic document locking (though most file systems support explicit locks).

That said, there are often some pretty hard limits on the number of objects in a table (directory). Depending on the file system you can be looking at anywhere from 10k to 1 billion files per directory.

There are also some unfortunate storage characteristics. Most file systems have a minimum file size of around 4kb, mostly to optimize for disk access. DBs often pack things together much more tightly.

But hey, if you can spin using the FS as a DB... Do it. Particular for a read heavy application, the FS is nearly perfect for such operations.

gpderetta · on Oct 1, 2021

The biggest problem is the lack of good transactional facilities.

the8472 · on Oct 1, 2021

you can lock directories, you can atomically swap directories (on linux), CoW filesystems make cloning kind of cheap. That could be used to implement transactions and commits. Getting the consistency checks/conflict detection during the commit right would be the most difficult part. Change notifications could be used to do some of that proactively. It's a terrible idea, but it could be done.

wongarsu · on Oct 1, 2021

There is a transactional API for NTFS in Windows [1]. It allows transactional operations not just within a file but also across files or across multiple computers (to make sure something is applied to your whole fleet atomically).

1: https://en.wikipedia.org/wiki/Transactional_NTFS

cogman10 · on Oct 1, 2021

Yup, the I in ACID is a bitch :)

edoceo · on Oct 1, 2021

You mean it's always been NoSQL?

Astronaut with gun: always has been.

amelius · on Oct 1, 2021

It doesn't really support transactions very well, though.

arghwhat · on Oct 2, 2021

True, but that's not at all a requirement for a database, and MVCC can be built on top where needed.

pure_simplicity · on Oct 1, 2021

I have the exact same wish. On top of that, i'd wish for application data to be stored in the system database by default, neatly namespaced and permissioned, so that you can allow for greater interoperability if desired and manually query and combine data across different applications.

There was some research being done on the concept of a db as a filesystem: https://youtu.be/wN6IwNriwHc

sanketsarang · on Oct 2, 2021

We actually did work on this a few years ago but did not get enough takers for it. We created a one size fits all database, that leverages the full capability of the file system.

Try it here: https://github.com/blobcity/db

PS: I am the chief architect of the DB, and the project is no longer being actively maintained by us. But if you make a contribution, we will oblige to review and merge a PR.

Bottom line, nothing you do can make your database faster than the filesystem. So why not make a database that just uses the filesystem to the fullest, than creating a filesystem on top of a filesystem. BlobCity DB does not create a secondary filesystem. It dumps all data directly to the filesystem, thereby giving peak filesystem performance. This is scientifically really the best it gets from a performance standpoint. Not necessarily the most efficient in data storage / data-compression standpoint.

This means, we gain speed, while compromising on data-compression. We produce a larger storage footprint, but are insanely fast. Storage is cheap, compute isn't. So that should be okay I suppose.

_abox · on Oct 1, 2021

Wasn't this what Microsoft was working on with WinFS in Longhorn which later became Vista but without the WinFS part?

And I think ReiserFS was also working towards this but got abandoned for obvious reasons.

zerd · on Oct 4, 2021

I remember watching a talk about that: https://www.youtube.com/watch?v=wN6IwNriwHc

Previous HN discussion: https://news.ycombinator.com/item?id=20394088

haydnv · on Oct 1, 2021

Yeah I just learned about tokio-uring and I'm planning to get it into the next major release of freqfs

jerrysievert · on Oct 1, 2021

until an underlying change in technology happens and then you wish they were no longer unified (rust to ssd to nvme, for example).

I would prefer more pluggable interfaces personally.

(hi Ryan, long time no see!)

munro · on Oct 4, 2021

Helloo Jerry!!! Great to hear from you!!