Hacker News new | past | comments | ask | show | jobs | submit login
A simple, kernel-space, on-disk filesystem from scratch (github.com/psankar)
101 points by r4um on March 29, 2014 | hide | past | favorite | 18 comments



"If you are planning to learn filesystems, start from the scratch. You can look from the first commit in this repository and move the way up."

Slightly off topic - I've always thought this is a very interesting use case for Github and open source in general - especially for understanding such low level concepts and their implementations. Provided the commit history is clear and comprehensive, this might be the best way to learn, other than do the whole thing yourself from scratch. Has somebody actually taken this path? What's your experience?


I've had this thought as well. I would love to see Github add support for a previous/next button when viewing a commit to make the experience seamless.

The trouble is, for many projects claiming to start from scratch, the "Initial commit" includes hundreds or thousands of lines of code. The practicality of learning this way is very much dependent on the author/committer making digestible, incremental commits and maintaining a linear history.


A few years ago I created a tutorial[1] using a similar mechanism. I put the tutorial text in the commit message as markdown, then created a script to create a tutorial from the git history.

Besides the benefits to readers in having git diffs available and being able to "skip steps" by replaying history, it made the tutorial much easier to keep up to date because it was working code.

I used stacked git[2] to make it easy to arbitrarily edit the git history.

compare the tutorial history[3] with the real history[4]

[1]: http://hobocentral.net/tutorials/agility [2]: http://www.procode.org/stgit/ [3]: https://github.com/Hobo/agility-gitorial [4]: https://github.com/Hobo/agility-gitorial-patches


Previous/Next is a bit complicated if one of the key features of the versioning system you're using is that it's a graph.

Of course, that graph could easily be flattened (e.g. simply sort by date) but that'd cause you to jump all over the graph. It would make for a terrible and confusing user experience as the next commit is suddenly based on a state of the code you saw 20 steps ago.

As it would only work properly for projects with a completely linear history (which most projects don't have) I can understand Github not adding this feature. I'd say this is something a third party app using the Github API could do very well for projects specifically set up for learning through their history (like this one).


git is a digraph though, so it's not as complicated as an arbitrary graph. Every commit has a single parent, making a previous button trivial. The next commit is difficult, since that's where branching happens, but that's not so hard to represent: show all options and let the user continue along one path or another.

The implementation probably just involves walking backwards through history until you reach the initial commit, maintaining a bidirectional linked list as you go.

Github already allows users to walk forward and backward though history via the commits page[0]. It seems reasonable to provide the same "Newer", "Older" buttons when viewing an individual commit.

[0]: https://github.com/torvalds/linux/commits/master?page=1


> Every commit has a single parent, making a previous button trivial.

Until, of course, you hit a merge commit. Which most projects have a lot of regardless of whether that's a good thing. I suppose you could always pick the parent which is on the same branch, but still.


Yup, I recommend getting https://github.com/git/git and checking out the very first commit.

If I recall it is like 500-ish lines of code, compiles easily, and is runnable. It makes a little .git folder. It's only the "plumbing" of git, back when it was a "content tracker".

Linus used to brag that it was self hosting in 3 days or something like that, and presumably this is what he was talking about.

It helps you understand the design of git for sure, and gave me an appreciation for Linus's coding style. I didn't really think he had a great sense of style because Linux is known to be somewhat of a smorgasbord. Git itself also has a sloppy and confusing interface IMO, but that's a different issue.

I didn't follow it any further than the first commit, but I bet you could learn a lot that way. (You would also presumably see how nobody designed the interface and it just accumulated commands and flags in a haphazard fashion)


And best of all, the implementation is in the public domain (CC0):

https://github.com/psankar/simplefs/blob/master/LICENSE


Not a filesystem. But a simple experimental kv storage from scratch, https://github.com/t3rm1n4l/lightkv


This filesystem seems absurdly simple from all the limitations it has; looking at the amount of code (~1kLOC) I was expecting a bit more functionality than that, maybe closer to FAT level.

I've written a FAT32 driver for an embedded system before (with full read/write support) in ~800 lines of Asm, so to express that in C it could be a lot shorter. I think FAT is one of the simplest filesystems already.


800 lines of assembler code basically means just 800 assembler (CPU) instructions - for a complete FAT32 filesystem driver? I really hardly doubt that unless seen. :-)

Any way to opensource that or anything else to comment on your shockinly few 800 assembler instructions? ;-)


Not to detract from the overall point but a small nitpick. Lines of assembler don't always translate directly to CPU instructions. Some assemblers provide high level assembler instructions for commonly used series of instructions. It will be one assembler instruction that would translate to a couple CPU instructions.


Depending on the amount of abstraction involved, it doesn't sound impossible to do. Given sane and direct access to the controller of the underlying device, and a reasonably simple underlying device (e.g. a simple Flash device), raw reads and writes can be written in maybe two dozen instructions each. Assuming userbinator was using a reasonable macro assembler, so that the 800 lines didn't also include string comparisons and all that, 800 lines doesn't look so little. I'm not sufficiently familiar with FAT32 but given a sane enough architecture, it doesn't sound that impossible.


Depends what you mean by "complete", but being able to create/read/write/delete files and directories is pretty complete to me. It doesn't take very many instructions to traverse a linked list or index an array, the two primary structures in FAT32.

Sorry I can't opensource it (being for a commercial product) but this is the approximate breakdown of that ~800 lines (I didn't count exactly, there are comments and whitespace included here too):

    70  initialisation
    140 cluster chain operations
    300 directory operations
    240 higher-level module interface
    50  data area (buffers, states, etc.)
Given sane and direct access to the controller of the underlying device, and a reasonably simple underlying device (e.g. a simple Flash device), raw reads and writes can be written in maybe two dozen instructions each.

It calls into another driver for accessing the block device, so all block read/write operations take the ~6 instructions needed for a call.

Assuming userbinator was using a reasonable macro assembler, so that the 800 lines didn't also include string comparisons and all that, 800 lines doesn't look so little.

I didn't use a macroassembler. It's for an 8-bit MCU; the final firmware image (including all the other functions of the product) is roughly 20KB.


Aye, it's an MCU I was thinking of in the first place, where read/write operations are done through a reasonably simple Flash controller and can be abstracted in a separate driver. I have written FS drivers of fairly similar complexity, in pretty much similar space constraints. That's why I wasn't incredulous.


Reminds me of a project I had done earlier: https://github.com/jyotiska/vfs


Your project repository contains a lot of backup files that should probably be removed (e.g. src/*.c~)


Submit a PR. Teach, help, and fix all in one action.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: