Hacker News new | past | comments | ask | show | jobs | submit login

Perhaps for those familiar with "functional data structures" such an analogy is helpful but I find it easier to simply explain git for what it is without adding more exotic nomenclature to it.

Git lets you do version control via full snapshots as opposed to just tracking diffs (even though it does actually do this too behind the scene).

You can think of a full snapshot as saving a copy of your project structure every time you do a commit. The key trick is that git doesn't actually create new copies of the content for each commit but simply maintains a tree structure whose nodes are pointers (via hashing) to the content they represent.

The complication from git is not in understanding the core concept but knowing how best to apply them. There are all sorts of crazy workflows you could implement by manipulating git pointers and their associated patches. As with anything that is flexible, difficulty comes in knowing how to constraint yourself when using it.




> git doesn't actually create new copies of the content for each commit

More precisely, it doesn't create new copies of content that you didn't change. For example, if you have 100 files in your repo and you change one of them and then commit, git creates a new copy of the content of the file you changed--a new blob storing the new file content--and a new tree object that references the new blob instead of the old one, plus the other 99 blobs that store the contents of files you didn't change; the new commit object then references the new tree object (plus the message and metadata). But git never stores diffs between old and new content; it just creates a new blob every time the content of a file changes.


> But git never stores diffs between old and new content; it just creates a new blob every time the content of a file changes.

Git pack files compress objects by storing them as diff files going backwards. That is, it stores the most recent state in full, then uses patches to go backwards. Because you're more likely to need a recent version in full than an older one.

https://git-scm.com/book/en/v2/Git-Internals-Packfiles


This is true but packfiles are an implementation detail.

It's still useful and more accurate conceptually to consider every commit as a complete snapshot of the state of code that point.


That can be said of every version control system. Restoration of state to any given version is their defining feature. How they achieve that is always an implementation detail, but those details can still be important and interesting.


Git commits are composed of all of the files in the commit, it’s parent and the commit message. This is an important guarantee that each checkout is valid without the rest of the repo. This allows you to have a lot of exotic implementations guarantee consistency between them. Meaning if your GitHub you can distribute commits across many servers. Or your Microsoft and you build partial checkouts for Gvfs. It’s what allows Git LFS to keep many of git’s core guarantees while making tradeoffs to improve areas where git is traditionally weak.


Sorta true but see what bbatha said.

There are people who distinguish changeset oriented and snapshot oriented and will hotly debate that one or the other is better.

But as you say, restoration of state is a necessary and defining feature.


Exactly. Those familiar with functional data structures would thus point you at git and say: heres a structure from the Okasaki book that you use every day.

One more step is to point that futures / promises, and even lists, are monads that a e.g. JS programmer uses every day, too. It reminds me of the old literary character who did not know that he's been speaking prose all his life.


Particularly since a git repo, as a whole, isn't a functional data structure. The commits (and the graph in which they're embedded) are immutable, but the mapping between branch names and commits gets mutated all the time. (To say nothing of slightly deeper esoterica like the index and stashes.)


That's pretty functional still, the "branches" are just (in clojure terms) atoms, aka atomically mutable references to a structure (namely a commit which is some metadata + a reference to a tree).


> Git lets you do version control via full snapshots as opposed to just tracking diffs.

That's completely orthogonal to whether the version history is immutable with each branch being like singly linked list where we "cons" new things onto the front.


That’s the point; because sentences like

> the version history is immutable with each branch being like a singly linked list where we “cons” new things onto the front

Just contributes to the general confusion around git where people decide it’s too complicated to learn.

(Apologies if my sarcasm detector is just broken today)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: