A simple, kernel-space, on-disk filesystem from scratch

deepakprakash · on March 29, 2014

"If you are planning to learn filesystems, start from the scratch. You can look from the first commit in this repository and move the way up."

Slightly off topic - I've always thought this is a very interesting use case for Github and open source in general - especially for understanding such low level concepts and their implementations. Provided the commit history is clear and comprehensive, this might be the best way to learn, other than do the whole thing yourself from scratch. Has somebody actually taken this path? What's your experience?

caipre · on March 29, 2014

I've had this thought as well. I would love to see Github add support for a previous/next button when viewing a commit to make the experience seamless.

The trouble is, for many projects claiming to start from scratch, the "Initial commit" includes hundreds or thousands of lines of code. The practicality of learning this way is very much dependent on the author/committer making digestible, incremental commits and maintaining a linear history.

bryanlarsen · on March 29, 2014

A few years ago I created a tutorial[1] using a similar mechanism. I put the tutorial text in the commit message as markdown, then created a script to create a tutorial from the git history.

Besides the benefits to readers in having git diffs available and being able to "skip steps" by replaying history, it made the tutorial much easier to keep up to date because it was working code.

I used stacked git[2] to make it easy to arbitrarily edit the git history.

compare the tutorial history[3] with the real history[4]

[1]: http://hobocentral.net/tutorials/agility [2]: http://www.procode.org/stgit/ [3]: https://github.com/Hobo/agility-gitorial [4]: https://github.com/Hobo/agility-gitorial-patches

michh · on March 29, 2014

Previous/Next is a bit complicated if one of the key features of the versioning system you're using is that it's a graph.

Of course, that graph could easily be flattened (e.g. simply sort by date) but that'd cause you to jump all over the graph. It would make for a terrible and confusing user experience as the next commit is suddenly based on a state of the code you saw 20 steps ago.

As it would only work properly for projects with a completely linear history (which most projects don't have) I can understand Github not adding this feature. I'd say this is something a third party app using the Github API could do very well for projects specifically set up for learning through their history (like this one).

caipre · on March 29, 2014

git is a digraph though, so it's not as complicated as an arbitrary graph. Every commit has a single parent, making a previous button trivial. The next commit is difficult, since that's where branching happens, but that's not so hard to represent: show all options and let the user continue along one path or another.

The implementation probably just involves walking backwards through history until you reach the initial commit, maintaining a bidirectional linked list as you go.

Github already allows users to walk forward and backward though history via the commits page[0]. It seems reasonable to provide the same "Newer", "Older" buttons when viewing an individual commit.

[0]: https://github.com/torvalds/linux/commits/master?page=1

michh · on March 29, 2014

> Every commit has a single parent, making a previous button trivial.

Until, of course, you hit a merge commit. Which most projects have a lot of regardless of whether that's a good thing. I suppose you could always pick the parent which is on the same branch, but still.

chubot · on March 29, 2014

Yup, I recommend getting https://github.com/git/git and checking out the very first commit.

If I recall it is like 500-ish lines of code, compiles easily, and is runnable. It makes a little .git folder. It's only the "plumbing" of git, back when it was a "content tracker".

Linus used to brag that it was self hosting in 3 days or something like that, and presumably this is what he was talking about.

It helps you understand the design of git for sure, and gave me an appreciation for Linus's coding style. I didn't really think he had a great sense of style because Linux is known to be somewhat of a smorgasbord. Git itself also has a sloppy and confusing interface IMO, but that's a different issue.

I didn't follow it any further than the first commit, but I bet you could learn a lot that way. (You would also presumably see how nobody designed the interface and it just accumulated commands and flags in a haphazard fashion)

tjaerv · on March 29, 2014

And best of all, the implementation is in the public domain (CC0):

https://github.com/psankar/simplefs/blob/master/LICENSE

slynux · on March 29, 2014

Not a filesystem. But a simple experimental kv storage from scratch, https://github.com/t3rm1n4l/lightkv

userbinator · on March 29, 2014

This filesystem seems absurdly simple from all the limitations it has; looking at the amount of code (~1kLOC) I was expecting a bit more functionality than that, maybe closer to FAT level.

I've written a FAT32 driver for an embedded system before (with full read/write support) in ~800 lines of Asm, so to express that in C it could be a lot shorter. I think FAT is one of the simplest filesystems already.

trapnii · on March 29, 2014

800 lines of assembler code basically means just 800 assembler (CPU) instructions - for a complete FAT32 filesystem driver? I really hardly doubt that unless seen. :-)

Any way to opensource that or anything else to comment on your shockinly few 800 assembler instructions? ;-)

stusmall · on March 29, 2014

Not to detract from the overall point but a small nitpick. Lines of assembler don't always translate directly to CPU instructions. Some assemblers provide high level assembler instructions for commonly used series of instructions. It will be one assembler instruction that would translate to a couple CPU instructions.

weland · on March 29, 2014

Depending on the amount of abstraction involved, it doesn't sound impossible to do. Given sane and direct access to the controller of the underlying device, and a reasonably simple underlying device (e.g. a simple Flash device), raw reads and writes can be written in maybe two dozen instructions each. Assuming userbinator was using a reasonable macro assembler, so that the 800 lines didn't also include string comparisons and all that, 800 lines doesn't look so little. I'm not sufficiently familiar with FAT32 but given a sane enough architecture, it doesn't sound that impossible.

userbinator · on March 29, 2014

Depends what you mean by "complete", but being able to create/read/write/delete files and directories is pretty complete to me. It doesn't take very many instructions to traverse a linked list or index an array, the two primary structures in FAT32.

Sorry I can't opensource it (being for a commercial product) but this is the approximate breakdown of that ~800 lines (I didn't count exactly, there are comments and whitespace included here too):

    70  initialisation
    140 cluster chain operations
    300 directory operations
    240 higher-level module interface
    50  data area (buffers, states, etc.)

Given sane and direct access to the controller of the underlying device, and a reasonably simple underlying device (e.g. a simple Flash device), raw reads and writes can be written in maybe two dozen instructions each.

It calls into another driver for accessing the block device, so all block read/write operations take the ~6 instructions needed for a call.

Assuming userbinator was using a reasonable macro assembler, so that the 800 lines didn't also include string comparisons and all that, 800 lines doesn't look so little.

I didn't use a macroassembler. It's for an 8-bit MCU; the final firmware image (including all the other functions of the product) is roughly 20KB.

weland · on March 30, 2014

Aye, it's an MCU I was thinking of in the first place, where read/write operations are done through a reasonably simple Flash controller and can be abstracted in a separate driver. I have written FS drivers of fairly similar complexity, in pretty much similar space constraints. That's why I wasn't incredulous.

jyotiska · on March 29, 2014

Reminds me of a project I had done earlier: https://github.com/jyotiska/vfs

stevekemp · on March 29, 2014

Your project repository contains a lot of backup files that should probably be removed (e.g. src/*.c~)

shawndumas · on March 29, 2014

Submit a PR. Teach, help, and fix all in one action.