Hacker News new | past | comments | ask | show | jobs | submit login
An overview of version control in programming (lemire.me)
90 points by ashvardanian on April 22, 2022 | hide | past | favorite | 60 comments



This article covers this history of vcs, and the current standard, git, but I wish it had a section about other approaches and apps, like perforce, Google's monorepo tooling, pijul, PlasticSCM + Unity, or Facebook's Eden. All of these alternatives have reasons for existing, and I would have enjoyed reading more about them.


> The basic logical unit of Git is the commit, which is a set of changes to multiple files.

In some circumstances it's helpful to think of commits as sets of changes, but Git's commit are snapshots and not changesets. Converting commits into sets of changes is done on-demand.


Git's user interface is very confused in this respect. You can tag a commit with a version identifier, and that indicates commits are snapshots or versions.

But then you have cherry-pick, which seems to work by "applying a commit" (you can't "apply" snapshots...I think) with git cherry-pick hash. That's the same thing as git cherry-pick hash^..hash which seems to more clearly indicate that we're applying the diff between two commits. And rebase, which really does present itself as a tool for reordering diffs.

Anyway, everyone knows what you mean when you say "apply this commit" so maybe this is just a stupid complaint about "consistency" with no real point. Especially since git does use both deltas and snapshots under the hood.


It's true that Git mostly stores snapshots rather than deltas internally. But it's still a version control system, and I don't think it would be improved if the abstraction of snapshots was leaked more aggressively into the interface. You can reconstruct a delta from a snapshot and vice versa, so it still makes sense that you can apply a commit (i.e. patch, delta) in Git.

One thing about Subversion that I miss in Git: when you move a file or a directory, Subversion tracks where it came from. That's not true in Git, which just stores snapshots (mostly) and has to guess whether a file was moved. That means that `svn log FILE` behaves like `git log --follow FILE` by default (except always right rather than "right when the heuristics work"), which is much more useful. One thing I dislike about Git is that its history model tends to crumble when you move files and directories around.


It is kind of frustrating that git doesn’t remember when you tell it you moved something. On the other hand, it’s also convenient to be able to mv something and just have git figure that out. Another case git does better in is when you do complex refactors. So loving the bulk of the content of one file to another is recognized as a move whereas SVN wouldn’t help you.

All in all, it’s a hard problem and there’s no perfect answer I think.


There's a solution which works most of the time, and it works equally well in Subversion and Git. Divide up the change into two commits:

1. Move the file verbatim.

2. Change the file as necessary.

Subversion will remember the move because it always does — Subversion stores changesets internally, and those changesets contain history metadata. Git won't track the move, but when you run `git log --follow` later it detects that the file was moved because the content of the file that was added was the same as something which existed before.

It's a little awkward if only a small portion of a file was migrated, though. In such a case, you may just have to fall back to documenting in the commit message that code was moved.

This approach has another benefit: it is easy to review. The first commit can be validated as "verbatim move — check!". Then the logical change can be validated with effort proportional to the amount that was changed. In contrast, if you lump them all together, it can be hard to discern what changed when presented with a huge diff containing mostly verbatim move but also a few subtle changes.


That's true until you get back to Git's packfiles, which use deltas.

Git's form of compression is very interesting, but a key concept as you study the history of version control is that snapshots and deltas encode the same information, so the choice of which to use is an implementation detail[1].

[1] I covered this in detail in a preso on Git data structure design for Papers We Love San Diego: https://www.youtube.com/watch?v=fHSZz_Mx-Uo&t=400s


This is sensitive to the design of the delta format though, once you have merges. I had to write some code to migrate version history from Rational Team Concert to git. RTC has a uuid for each version of a file, and deltas are a set of zero or more predecessor versions, and a successor version. This allowed the same delta to be present in different “streams” (branches), if they only differed in files not relevant to the delta; but it becomes impossible to reconstruct a history of snapshots from the change sets.


My previous employer still uses svn. It created so much headaches and frustration due to unproductivity because e.g. someone pushed a bug without testing and you unknowingly downloaded it. No way to easily reverse of course. It was my first job after University and taught me that CRUD b2b Java enterprise jobs are to avoid at all costs in the future. For what it's worth though, I had some very funny-sad stories to tell when I interviewed to get out of there after just 6 months. Never tought a versioning system could burn me out. Git is standard for very obvious reasons.


It sounds like you didn't know svn, because what you describe is the most basic feature of any version control system - being able to revert to any of the previously submitted versions is basically the reason VCSs exist. In addition pretty much every svn GUI frontend should have provide a single click way to check out a local copy of a previous version that you can commit if you want.

Svn has some warts (shelving feels sooo half baked despite having like 3 different ways for that) but at least reverting to an older state isn't one of them.


I prefer Git, I've invested a lot in learning it, and have done presos and training for it over the years. But I cut my teeth on CVS, and Subversion was a significant improvement over CVS.

Not that I expect anybody to maintain historical perspective. Linus Torvalds famous 2007 rant at Google shitting all over Subversion and its developers set the tone for a long tradition of anti-Subversion anti-history. Never mind "learning from what's gone before", it's like there was only backwards progress until the tool du jour was suddenly invented ex nihilo.


> No way to easily reverse of course.

Revert to the previous version. Push as new version. Fixed? SVN was used widely and for years, prior to git. I remember the biggest problem being expensive (full copy) branching.


Creating branches is not expensive in SVN, because it’s copy-on-write (server-side). Maybe you mean switching the local working copy to a different branch?


> Creating branches is not expensive in SVN, because it’s copy-on-write

Copy on write is an expensive operation for branching. SVN repos that are multiple gigs, in size, are a pain to create branches for.


Not sure what you mean. Copy-on-write means that no actual copy is made initially. Branching is therefore almost instant in SVN and only requires minimal storage space. Only when you modify a file within the branch (or the original) does that specific file get copied (or rather, the diff of the modification gets stored).

What is slow in SVN is switching to the branch client-side, because branches are always created server-side and you still need to synch the whole contents of the branch from server to client when switching (which works similar to a regular `svn update`).


I'm not sure what you mean either. I don't care what SVN does now, only what it did when companies abandoned it en-masse around SVN v1.6, since that's the context of my comments.


SVN branches were always cheap copies, right from the original version of SVN.

"svn cp http://.../trunk http://.../branches/foo" may not be the most concise command to type, but all it does is create a new item "foo" in a new revision, which references a previous item "trunk" in a previous revision. How many files/directories are under that doesn't affect how long the operation takes.


What I described is what SVN always did. Cheap copies due to COW was always a selling point for SVN. My point is whatever was slow for you wasn't due to COW.


It's one of my interview questions when applying for a job. I say something along the lines: "I know this is a ridiculous question, but do you 1) use version control and 2) what version control do you use" and then explain I have to ask these questions becz of obvious reasons.


I had a job in the late 90's at a newly formed subsidiary of a multi-billion dollar corporation. The product was cobbled together from various third party software they licensed or acquired outright, and we were customizing. You would think they'd have set up version control to at least keep track of what came from where. Nope. Here I was, a 22 year old kid, teaching people about CVS. SVN wasn't quite out yet.


Same for me in the early 2000’s. :)


I had a coworker hired in 2015 (with previous work experience) who had never used any revision control.


If you've only worked in very small, mom-and-pop style companies, you might run into this. I knew small companies (under 30) that ran without version control for over a decade. These were PHP shops that did most of their development on a dedicated server, with people editing files remotely over FTP (or SFTP if you were lucky.)


There's a blog post from the early 00s called The Joel Test for judging a software team. Using source control is actually the first question. Sounds like you reinvented it :)

https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-s...


Don't apologize for your questions! You're interviewing the company just as much as they're interviewing you and poor attitude in responding to such questions is a red flag IMO.

That said, if you're worried about how your questions are perceived, a little rewording can suss this out and give you more context. I like to ask:

What does your software development process look like? Tell me about how you manage, test, and deploy your product.

You can rephrase a lot of questions in a similar manner to get what you want and leave room for the interviewer to expand (or justify!) their responses.


I didn't mention what I was subtly after, which was the facial expression after asking that question.

Usually it's sigh and a nod of the head (as if they are remembering THAT company). That's a good sign for me.

There is a mutual recognition that we've both worked in some terrible companies and learnt how NOT to do some things. This generally means (and some good followup questions) that they have CI/CD,a PR review process, some ticket tracking, agile/kanban etc.

I'm generally not worried about how my questions are perceived as that's a red flag for me. If they have a problem with that, then most likely not a good place (culturally) I want to work.


Also: wear the nicest thing you'd wear for work to the interview, even if that's just a t-shirt (because you are a slob and proud of it). If you don't get the job as a result of your clothes, you didn't want that job.


Do you have a monorepo? Test infrastructure?


Nice. Sort of a reverse FizzBuzz.


Sincere question: what does Git have that makes easy to revert the change on your example that SVN doesn’t have? I barely know Git, that’s why I ask.


You (or anybody) can use git revert to create a new commit that undoes previous commits: https://git-scm.com/docs/git-revert

Perforce has something similar too: http://ftp.perforce.com/perforce/r16.2/doc/manuals/cmdref/p4...


Sven also has a command for reverting a commit, ie committing the inverse of a previous commit.


For anyone looking for it, it's not as intuitive as git. In svn, you reverse-merge the commits from the current branch's history you want to revert:

  svn merge -c-123 . # single commit, the second "-" is "do the merge backwards"

  svn merge -r123:122 . # Multiple commits in reverse order, "the change going from 123 to 122"
https://stackoverflow.com/questions/13330011/how-do-i-revert...


Git makes branching and merging easier, and the whole history is available locally. You can just ignore those buggy commits and base your work on an older version; when someone eventually fixes the bug, you can merge the bug and its fix into your changes, to become up-to-date (with the possibility of conflicts, but nothing is completely free ;) ).


I once took a job at a place that had migrated from git to perforce because it was the "corporate standard." Some departments worked with large binary assets (videos, sound files) where I understand it made more sense. We were developing web apps. This was almost a decade ago now, and working with it felt tedious. I also made it about 6 months.


There is good git <-> svn integration. I was using git even when any of my peers or projects were on svn and they never noticed.


Not sure what you mean, as “pushing bugs without testing” (or not using bugfix/feature branches) has nothing to do whatsoever with SVN.


> "RCS is faster and uses less disk space than SCCS."

That was the promotion.

In fact RCS was never as fast as SCCS. RCS did not, in fact, use less disk space than SCCS. It is hard to identify any particular where RCS was better. The code quality was abysmal. But Tichy got a PhD out of it, so there was that, anyway.


I appreciated the historical context with RCS et al. but I’m left wondering where mercurial sits in all this?

I’m curious about the differences between git and mercurial, are there benefits in choosing one over the other?


Mercurial is good. It just didn't win.

It is coded in Python, which should have made it slow, but it was very fast. All the data motion and analysis bypassed the Python interpreter.

There was another called Monotone, where Git lifted its data model from, wholesale.

And there is Fossil, which is growing in popularity. If it offered a way to squeeze intermediate edits out of the revision history, it might grow faster, but the author is hostile to the concept. Where it really shines is in never corrupting its data store. I have had Git corrupt its data store quite a few times. By luck, I have not seen it happen on a server others relied on, just on my own cloned repositories. But the tricky stuff is mostly done to cloned repositories.


Fossil already provides this using private branches.

https://fossil-scm.org/home/doc/trunk/www/private.wiki


Thanks for this.

Are you saying when a private branch is published, it shows up in the public repository as a single diff?


Yes, that's correct. Private branches are never pushed and remain in the local repository. In the linked article, it also mentions how private branches can be 'pulled' from other repositories.


I used mercurial at one job and my experience was that the day-to-day differences were largely philosophical, especially if you primarily interact with VCS through your IDE, like me. Mercurial’s CLI API is supposedly cleaner, but I hardly interacted with it.

Mercurial has immutable history, so no squashing commits, no deleting branches, at the time I was using it there was no amending commits, in fact reverting a commit doesn’t even come enabled out of the box! Some folks loved it; no changing history, everything documented as it happened. We practiced trunk based development, so no branches except for hotfixes, so there wasn’t a lot of sprawl.

Ecosystems largely don’t support mercurial, so that’s definitely a consideration. Since Merge Requests and feature branches are largely practiced now, I feel like there’d be a lot of noise in a repo if folks used mercurial.

I don’t particularly miss mercurial, personally. I’m less into “pure” workflows and forcing behaviors. I think git is super flexible and generally practical and I’m overall pretty happy with it.


By default, Mercurial doesn't allow editing of public history pushed to or pulled from a remote repository. There are no such restrictions for local revisions, which are considered draft. And if you configure the remote as a draft repository, you can keep some remote revisions as draft before publishing them.

The history editing capability of Mercurial is arguably more advanced than git, especially in a collaborative setting, because of Changeset Evolution [1], and the Evolve extention [2]. The former keeps track of metahistory of commits. The former keeps track of the metahistory of commits, and synchronises it between repositories. The latter provides a set of expressive command line tool to edit history. With them, collaborative history editing and stacked PR is a pleasant experience.

[1]: https://www.mercurial-scm.org/wiki/ChangesetEvolution [2]: https://www.mercurial-scm.org/doc/evolution/


Basically none of this post is true. History in Mercurial is thoughtfully mutable (much more than Git) with public and draft phases; hg backout is part of the base distribution; it fully supports the PR model and better ones like stacked diffs. Ultimately it lost because of GitHub, but it lives on as Fig at Google and Eden at Facebook (both of which heavily use mutable history, of course).


Talking about flexible mutable history, there are two more official extensions to be mentioned:

- evolve [0], which allows to rewrite history lossessly (without ever risking losing data)

- absorb [1] which takes uncommitted working copy changes, and for each hunk finds the last commit that touched those lines, and rewrites it. It's an extension originally from Facebook, in core since 2018. Works like magic: no "fix" commits ever more.

Plus, all of this is available using mercurial locally and interacting with git (and github) remotely, via hg-git. Admittedly, this requires to be a bit of an advanced user, but the gains in ergonomics are tangible.

[0] https://www.mercurial-scm.org/doc/evolution/

[1] https://gregoryszorc.com/blog/2018/11/05/absorbing-commit-ch...


It is kind of true, though. From Mercurial docs:

>The public phase holds changesets that have been exchanged publicly. Changesets in the public phase are expected to remain in your repository history and are said to be _immutable_

Note that in Git all history is mutable, even if published.

Regarding 'backout' (and 'revert'), to the best of my knowledge, it does not revert commit, it creates a new one (reverting changes), and I frankly do not know is that's possible at all to amend commit in Mercurial (when I worked with it, that was definitely not possilbe, but that was a long time ago)


In Mercurial all history is mutable too, there's just a UI failsafe that prevents you from mutating public history unless you manually override it. Git is just plain bad in this regard.

hg backout is like git revert. Creating a new commit is correct if you want to e.g. propagate the change through continuous deployment.

And hg commit --amend has existed for a long time.


Git has "reset" in addition to "revert" (the former mutates history but the latter does not). What Mercurial has for removing the wrong commit?

For amend option - well, we switched to Git at time of Mercurial 2.1 (I said it was long time ago), did not notice they added this feature, sorry.


git reset does like 25 different things.


that would be `hg prune` in modern Mercurial.


You can do

> hg commit --amend

to change the topmost commit.

Marking commits as public is mostly a safeguard against accidentally altering history that others may already depend upon. This is just there to provide awareness of the giant footguns hiding when editing history after it has been shared (git contains the same footguns without safeguards). You can revert the status of a commit from public to draft and then change it. Just like in git, it's very dangerous to do so, but hg makes it very obvious. The command is

> hg phase --draft --force .


On top of that, there is the notion of change set _obsolescence_: a change may be replaced by a newer one, effectively replacing it.


What do you do with repos that have to version control (a) source code / txt files but also (b) large quantities of binary data /resources (single digit GBs)?


Personally i'm using Subversion. It has a bunch of annoyances and the GUI frontend story is poor on Linux (the best i've found -at least in openSUSE's repos- is kdesvn but it barely has seen any development in recent years and lacks support for any non-basic functionality) but it seems there hasn't been any improvement in CVCSs since Svn, at least in the open source space (which kinda makes sense since the overwhelming vast majority of developers work mainly or only with code and text files), so it is basically the best you can get.


> GUI frontend story is poor on Linux

On the other hand, if you're on Windows, TortoiseSvn is one of the best VCS interfaces I've ever used.

- It isn't yet another tool which shows you all your directories and files (in addition to Explorer and your IDE/editor).

- As Subversion was always designed to be a library (as it was already obvious then that VCSs would be used from other tools like IDEs) the integration is much better than TortoiseCvs or TortoiseGit which compose command lines and then execute them.


Git LFS?


This article picks up the thread at SCCS, but there have to have been version control systems before that, i.e. on System 360.


SCCS was originally developed on System/370. And it was the first true version control system specifically for source code. There were some earlier programs with rudimentary similar functionality, but they're more comparable to diff/patch.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: